Hi, I checked the number of partitions by
System.out.println("INFO: RDD with " + rdd.partitions().size() + " partitions created."); Each single split is about 100MB. I am currently loading the data from local file system, would this explains this observation? Thank you! Best, Wenlei On Tue, Apr 21, 2015 at 6:28 AM, Archit Thakur <archit279tha...@gmail.com> wrote: > Hi, > > It should generate the same no of partitions as the no. of splits. > Howd you check no of partitions.? Also please paste your file size and > hdfs-site.xml and mapred-site.xml here. > > Thanks and Regards, > Archit Thakur. > > On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie <wenlei....@gmail.com> wrote: > >> Hi, >> >> I am wondering the mechanism that determines the number of partitions >> created by SparkContext.sequenceFile ? >> >> For example, although my file has only 4 splits, Spark would create 16 >> partitions for it. Is it determined by the file size? Is there any way to >> control it? (Looks like I can only tune minPartitions but not maxPartitions) >> >> Thank you! >> >> Best, >> Wenlei >> >> >> > -- Wenlei Xie (谢文磊) Ph.D. Candidate Department of Computer Science 456 Gates Hall, Cornell University Ithaca, NY 14853, USA Email: wenlei....@gmail.com