subject:"\"Partitions with zero records \\\& variable task times\""

Re: Partitions with zero records & variable task times

2015-09-09 Thread mark

The article is interesting but doesn't really help. It has only one sentence about data distribution in partitions. How can I diagnose skewed data distribution? How could evenly sized blocks in HDFS lead to skewed data anyway? On 9 Sep 2015 2:29 pm, "Akhil Das" wrote: > This post here has a bit

Re: Partitions with zero records & variable task times

2015-09-08 Thread Akhil Das

This post here has a bit information http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/ Thanks Best Regards On Wed, Sep 9, 2015 at 6:44 AM, mark wrote: > As I understand things (maybe naively), my input data are stored in equa

Re: Partitions with zero records & variable task times

2015-09-08 Thread mark

As I understand things (maybe naively), my input data are stored in equal sized blocks in HDFS, and each block represents a partition within Spark when read from HDFS, therefore each block should hold roughly the same number of records. So something is missing in my understanding - what can cause

Re: Partitions with zero records & variable task times

2015-09-08 Thread Akhil Das

Try using a custom partitioner for the keys so that they will get evenly distributed across tasks Thanks Best Regards On Fri, Sep 4, 2015 at 7:19 PM, mark wrote: > I am trying to tune a Spark job and have noticed some strange behavior - > tasks in a stage vary in execution time, ranging from 2

Partitions with zero records & variable task times

2015-09-04 Thread mark

I am trying to tune a Spark job and have noticed some strange behavior - tasks in a stage vary in execution time, ranging from 2 seconds to 20 seconds. I assume tasks should all run in roughly the same amount of time in a well tuned job. So I did some investigation - the fast tasks appear to have

Re: Partitions with zero records & variable task times

Re: Partitions with zero records & variable task times

Re: Partitions with zero records & variable task times

Re: Partitions with zero records & variable task times

Partitions with zero records & variable task times

5 matches

Site Navigation

Mail list logo

Footer information