What file system are you using ? If you use hdfs, the documentation you cited is pretty clear on how partitions are determined.
bq. file X replicated on 4 machines I don't think replication factor plays a role w.r.t. partitions. On Thu, Feb 19, 2015 at 8:05 AM, Alessandro Lulli <lu...@di.unipi.it> wrote: > Hi All, > > Could you please help me understanding how Spark defines the number of > partitions of the RDDs if not specified? > > I found the following in the documentation for file loaded from HDFS: > *The textFile method also takes an optional second argument for > controlling the number of partitions of the file. By default, Spark creates > one partition for each block of the file (blocks being 64MB by default in > HDFS), but you can also ask for a higher number of partitions by passing a > larger value. Note that you cannot have fewer partitions than blocks* > > What is the rule for file loaded from the file systems? > For instance, i have a file X replicated on 4 machines. If i load the file > X in a RDD how many partitions are defined and why? > > Thanks for your help on this > Alessandro >