Re: Too many RDD partititons ???

Jey Kottalam Thu, 23 Jan 2014 13:56:39 -0800

You can use the RDD.coalese method to adjust the partitioning of a
single RDD. Note that you'll need to set "shuffle=true" when
increasing the number of partitions.


See: 
http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/rdd/RDD.html#coalesce(Int,Boolean):RDD[T]

On Thu, Jan 23, 2014 at 1:18 PM, Manoj Samel <[email protected]> wrote:
> Hi,
>
> On some RDD actions, I noticed ~500 tasks being executed. In the tasks
> details, most of the tasks were too small IMO and may be the task
> startup/shutdown/coordination overhead is coming into picture. The task
> durations are
>
> Min : 5ms
> 25th %ile: 9ms
> Median: 10ms
> 75th %ile: 13 ms
> Max: 40 ms
>
> In the RDDs, number of partitions are 428 for Many RDDs built on top of each
> other. The base RDD could benefit from large number of partitions but RDDs
> derived from it should have much less # of partitions.
>
> How to control # of partitions @ RDD level ?
>
>

Re: Too many RDD partititons ???

Reply via email to