Re: Spark fpg large basket

Sean Owen Wed, 11 Mar 2015 04:58:51 -0700

Have you looked at how big your output is? for example, if your min
support is very low, you will output a massive volume of frequent item
sets. If that's the case, then it may be expected that it's taking
ages to write terabytes of data.


On Wed, Mar 11, 2015 at 8:34 AM, Sean Barzilay <sesnbarzi...@gmail.com> wrote:
> The program spends its time when I am writing the output to a text file and
> I am using 70 partitions
>
>
> On Wed, 11 Mar 2015 9:55 am Sean Owen <so...@cloudera.com> wrote:
>>
>> I don't think there is enough information here. Where is the program
>> spending its time? where does it "stop"? how many partitions are
>> there?
>>
>> On Wed, Mar 11, 2015 at 7:10 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>> > You need to set spark.cores.max to a number say 16, so that on all 4
>> > machines the tasks will get distributed evenly, Another thing would be
>> > to
>> > set spark.default.parallelism if you haven't tried already.
>> >
>> > Thanks
>> > Best Regards
>> >
>> > On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com>
>> > wrote:
>> >>
>> >> I am running on a 4 workers cluster each having between 16 to 30 cores
>> >> and
>> >> 50 GB of ram
>> >>
>> >>
>> >> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com>
>> >> wrote:
>> >>>
>> >>> Depending on your cluster setup (cores, memory), you need to specify
>> >>> the
>> >>> parallelism/repartition the data.
>> >>>
>> >>> Thanks
>> >>> Best Regards
>> >>>
>> >>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay
>> >>> <sesnbarzi...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm
>> >>>> from the mllib library. When I am trying to run the algorithm over a
>> >>>> large
>> >>>> basket(over 1000 items) the program seems to never finish. Did anyone
>> >>>> find a
>> >>>> workaround for this problem?
>> >>>
>> >>>
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark fpg large basket

Reply via email to