The program spends its time when I am writing the output to a text file and I am using 70 partitions
On Wed, 11 Mar 2015 9:55 am Sean Owen <so...@cloudera.com> wrote: > I don't think there is enough information here. Where is the program > spending its time? where does it "stop"? how many partitions are > there? > > On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > > You need to set spark.cores.max to a number say 16, so that on all 4 > > machines the tasks will get distributed evenly, Another thing would be to > > set spark.default.parallelism if you haven't tried already. > > > > Thanks > > Best Regards > > > > On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com> > > wrote: > >> > >> I am running on a 4 workers cluster each having between 16 to 30 cores > and > >> 50 GB of ram > >> > >> > >> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >>> > >>> Depending on your cluster setup (cores, memory), you need to specify > the > >>> parallelism/repartition the data. > >>> > >>> Thanks > >>> Best Regards > >>> > >>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay < > sesnbarzi...@gmail.com> > >>> wrote: > >>>> > >>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm > >>>> from the mllib library. When I am trying to run the algorithm over a > large > >>>> basket(over 1000 items) the program seems to never finish. Did anyone > find a > >>>> workaround for this problem? > >>> > >>> > > >