Depending on your cluster setup (cores, memory), you need to specify the parallelism/repartition the data.
Thanks Best Regards On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay <sesnbarzi...@gmail.com> wrote: > Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from > the mllib library. When I am trying to run the algorithm over a large > basket(over 1000 items) the program seems to never finish. Did anyone find > a workaround for this problem? >