You need to set spark.cores.max to a number say 16, so that on all 4
machines the tasks will get distributed evenly, Another thing would be to
set spark.default.parallelism if you haven't tried already.

Thanks
Best Regards

On Wed, Mar 11, 2015 at 12:27 PM, Sean Barzilay <sesnbarzi...@gmail.com>
wrote:

> I am running on a 4 workers cluster each having between 16 to 30 cores and
> 50 GB of ram
>
> On Wed, 11 Mar 2015 8:55 am Akhil Das <ak...@sigmoidanalytics.com> wrote:
>
>> Depending on your cluster setup (cores, memory), you need to specify the
>> parallelism/repartition the data.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay <sesnbarzi...@gmail.com>
>> wrote:
>>
>>> Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm
>>> from the mllib library. When I am trying to run the algorithm over a large
>>> basket(over 1000 items) the program seems to never finish. Did anyone find
>>> a workaround for this problem?
>>>
>>
>>

Reply via email to