Re: RDD Partition number

Ted Yu Thu, 19 Feb 2015 08:34:50 -0800

What file system are you using ?

If you use hdfs, the documentation you cited is pretty clear on how
partitions are determined.


bq. file X replicated on 4 machines

I don't think replication factor plays a role w.r.t. partitions.

On Thu, Feb 19, 2015 at 8:05 AM, Alessandro Lulli <lu...@di.unipi.it> wrote:

> Hi All,
>
> Could you please help me understanding how Spark defines the number of
> partitions of the RDDs if not specified?
>
> I found the following in the documentation for file loaded from HDFS:
> *The textFile method also takes an optional second argument for
> controlling the number of partitions of the file. By default, Spark creates
> one partition for each block of the file (blocks being 64MB by default in
> HDFS), but you can also ask for a higher number of partitions by passing a
> larger value. Note that you cannot have fewer partitions than blocks*
>
> What is the rule for file loaded from the file systems?
> For instance, i have a file X replicated on 4 machines. If i load the file
> X in a RDD how many partitions are defined and why?
>
> Thanks for your help on this
> Alessandro
>

Re: RDD Partition number

Reply via email to