Hi I am using Spark to distribute computationally intensive tasks across the cluster. Currently I partition my RDD of tasks randomly. There is a large variation in how long each of the jobs take to complete, leading to most partitions being processed quickly and a couple of partitions take forever to complete. I can mitigate this problem by increasing the number of partitions to some extent.
Ideally i would like to partition tasks by complexity (Let's assume I can get such a value from the task object) such that each sum of complexity in of elements in each partition evenly distributed. Has anyone created such a partitioner before? Regards Deenar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171.html Sent from the Apache Spark User List mailing list archive at Nabble.com.