The problem is that equally-sized partitions take variable time to complete
based on their contents?

Sent from my mobile phone
On May 1, 2014 8:31 AM, "deenar.toraskar" <deenar.toras...@db.com> wrote:

> Hi
>
> I am using Spark to distribute computationally intensive tasks across the
> cluster. Currently I partition my RDD of tasks randomly. There is a large
> variation in how long each of the jobs take to complete, leading to most
> partitions being processed quickly and a couple of partitions take forever
> to complete. I can mitigate this problem by increasing the number of
> partitions to some extent.
>
> Ideally i would like to partition tasks by complexity (Let's assume I can
> get such a value from the task object) such that each sum of complexity in
> of elements in each partition evenly distributed. Has anyone created such a
> partitioner before?
>
>
> Regards
> Deenar
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to