My min support is low and after filling out all my space I am applying a filter on the results to only get item seta that interest me
On Wed, 11 Mar 2015 1:58 pm Sean Owen <so...@cloudera.com> wrote: > Have you looked at how big your output is? for example, if your min > support is very low, you will output a massive volume of frequent item > sets. If that's the case, then it may be expected that it's taking > ages to write terabytes of data. > > On Wed, Mar 11, 2015 at 8:34 AM, Sean Barzilay <sesnbarzi...@gmail.com> > wrote: > > The program spends its time when I am writing the output to a text file > and > > I am using 70 partitions > > > > > > On Wed, 11 Mar 2015 9:55 am Sean Owen <so...@cloudera.com> wrote: > >> > >> I don't think there is enough information here. Where is the program > >> spending its time? where does it "stop"? how many partitions are > >> there? > >> > >> On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das <ak...@sigmoidanalytics.com> > >> wrote: > >> > You need to set spark.cores.max to a number say 16, so that on all 4 > >> > machines the tasks will get distributed evenly, Another thing would be > >> > to > >> > set spark.default.parallelism if you haven't tried already. > >> > > >> > Thanks > >> > Best Regards > >> > > >> > On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay < > sesnbarzi...@gmail.com> > >> > wrote: > >> >> > >> >> I am running on a 4 workers cluster each having between 16 to 30 > cores > >> >> and > >> >> 50 GB of ram > >> >> > >> >> > >> >> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com> > >> >> wrote: > >> >>> > >> >>> Depending on your cluster setup (cores, memory), you need to specify > >> >>> the > >> >>> parallelism/repartition the data. > >> >>> > >> >>> Thanks > >> >>> Best Regards > >> >>> > >> >>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay > >> >>> <sesnbarzi...@gmail.com> > >> >>> wrote: > >> >>>> > >> >>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg > algorithm > >> >>>> from the mllib library. When I am trying to run the algorithm over > a > >> >>>> large > >> >>>> basket(over 1000 items) the program seems to never finish. Did > anyone > >> >>>> find a > >> >>>> workaround for this problem? > >> >>> > >> >>> > >> > >