Re: Spark fpg large basket

Sean Barzilay Wed, 11 Mar 2015 01:35:51 -0700

The program spends its time when I am writing the output to a text file and
I am using 70 partitions


On Wed, 11 Mar 2015 9:55 am Sean Owen <so...@cloudera.com> wrote:

> I don't think there is enough information here. Where is the program
> spending its time? where does it "stop"? how many partitions are
> there?
>
> On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
> > You need to set spark.cores.max to a number say 16, so that on all 4
> > machines the tasks will get distributed evenly, Another thing would be to
> > set spark.default.parallelism if you haven't tried already.
> >
> > Thanks
> > Best Regards
> >
> > On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com>
> > wrote:
> >>
> >> I am running on a 4 workers cluster each having between 16 to 30 cores
> and
> >> 50 GB of ram
> >>
> >>
> >> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
> >>>
> >>> Depending on your cluster setup (cores, memory), you need to specify
> the
> >>> parallelism/repartition the data.
> >>>
> >>> Thanks
> >>> Best Regards
> >>>
> >>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay <
> sesnbarzi...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm
> >>>> from the mllib library. When I am trying to run the algorithm over a
> large
> >>>> basket(over 1000 items) the program seems to never finish. Did anyone
> find a
> >>>> workaround for this problem?
> >>>
> >>>
> >
>

Re: Spark fpg large basket

Reply via email to