Re: RDD Partition number

Ted Yu Thu, 19 Feb 2015 09:28:01 -0800

bq. *blocks being 64MB by default in HDFS*


*In hadoop 2.1+, default block size has been increased.*
See https://issues.apache.org/jira/browse/HDFS-4053

Cheers

On Thu, Feb 19, 2015 at 8:32 AM, Ted Yu <[email protected]> wrote:

> What file system are you using ?
>
> If you use hdfs, the documentation you cited is pretty clear on how
> partitions are determined.
>
> bq. file X replicated on 4 machines
>
> I don't think replication factor plays a role w.r.t. partitions.
>
> On Thu, Feb 19, 2015 at 8:05 AM, Alessandro Lulli <[email protected]>
> wrote:
>
>> Hi All,
>>
>> Could you please help me understanding how Spark defines the number of
>> partitions of the RDDs if not specified?
>>
>> I found the following in the documentation for file loaded from HDFS:
>> *The textFile method also takes an optional second argument for
>> controlling the number of partitions of the file. By default, Spark creates
>> one partition for each block of the file (blocks being 64MB by default in
>> HDFS), but you can also ask for a higher number of partitions by passing a
>> larger value. Note that you cannot have fewer partitions than blocks*
>>
>> What is the rule for file loaded from the file systems?
>> For instance, i have a file X replicated on 4 machines. If i load the
>> file X in a RDD how many partitions are defined and why?
>>
>> Thanks for your help on this
>> Alessandro
>>
>
>

Re: RDD Partition number

Reply via email to