Hi

I am using Spark to distribute computationally intensive tasks across the
cluster. Currently I partition my RDD of tasks randomly. There is a large
variation in how long each of the jobs take to complete, leading to most
partitions being processed quickly and a couple of partitions take forever
to complete. I can mitigate this problem by increasing the number of
partitions to some extent.

Ideally i would like to partition tasks by complexity (Let's assume I can
get such a value from the task object) such that each sum of complexity in
of elements in each partition evenly distributed. Has anyone created such a
partitioner before? 


Regards
Deenar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to