Have you looked at how big your output is? for example, if your min support is very low, you will output a massive volume of frequent item sets. If that's the case, then it may be expected that it's taking ages to write terabytes of data.
On Wed, Mar 11, 2015 at 8:34 AM, Sean Barzilay <sesnbarzi...@gmail.com> wrote: > The program spends its time when I am writing the output to a text file and > I am using 70 partitions > > > On Wed, 11 Mar 2015 9:55 am Sean Owen <so...@cloudera.com> wrote: >> >> I don't think there is enough information here. Where is the program >> spending its time? where does it "stop"? how many partitions are >> there? >> >> On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> > You need to set spark.cores.max to a number say 16, so that on all 4 >> > machines the tasks will get distributed evenly, Another thing would be >> > to >> > set spark.default.parallelism if you haven't tried already. >> > >> > Thanks >> > Best Regards >> > >> > On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com> >> > wrote: >> >> >> >> I am running on a 4 workers cluster each having between 16 to 30 cores >> >> and >> >> 50 GB of ram >> >> >> >> >> >> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com> >> >> wrote: >> >>> >> >>> Depending on your cluster setup (cores, memory), you need to specify >> >>> the >> >>> parallelism/repartition the data. >> >>> >> >>> Thanks >> >>> Best Regards >> >>> >> >>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay >> >>> <sesnbarzi...@gmail.com> >> >>> wrote: >> >>>> >> >>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm >> >>>> from the mllib library. When I am trying to run the algorithm over a >> >>>> large >> >>>> basket(over 1000 items) the program seems to never finish. Did anyone >> >>>> find a >> >>>> workaround for this problem? >> >>> >> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org