Most likely in your case, the partition keys are not evenly distributed and
hence you can notice some of your tasks taking way too longer time to
process. You will have to use custom partitioner
<http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where>
and
partition your data accordingly, give it a try if you haven't tried it yet.

Thanks
Best Regards

On Wed, Oct 28, 2015 at 5:05 PM, t3l <t...@threelights.de> wrote:

> I have a cluster with 2 nodes (32 CPU cores each). My data is distributed
> evenly, but the processing times for each partition can vary greatly. Now,
> sometimes Spark seems to conclude from the current workload on both nodes
> that it might be better to shift one partition from node1 to node2 (because
> that guy has cores waiting for work). Am i hallucinating or is that really
> the happening? Is there any way I prevent this from happening?
>
> Greetings,
>
> T3L
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Prevent-partitions-from-moving-tp25216.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to