Hi, How did u check no of splits in ur file. Did i run ur mr job or calculated it.? The formula for split size is max(minSize, min(max size, block size)). Can u check if it satisfies ur case.?
Thanks & Regards, Archit Thakur. On Saturday, April 25, 2015, Wenlei Xie <wenlei....@gmail.com> wrote: > Hi, > > I checked the number of partitions by > > System.out.println("INFO: RDD with " + rdd.partitions().size() + " > partitions created."); > > > Each single split is about 100MB. I am currently loading the data from > local file system, would this explains this observation? > > Thank you! > > Best, > Wenlei > > On Tue, Apr 21, 2015 at 6:28 AM, Archit Thakur <archit279tha...@gmail.com > <javascript:_e(%7B%7D,'cvml','archit279tha...@gmail.com');>> wrote: > >> Hi, >> >> It should generate the same no of partitions as the no. of splits. >> Howd you check no of partitions.? Also please paste your file size and >> hdfs-site.xml and mapred-site.xml here. >> >> Thanks and Regards, >> Archit Thakur. >> >> On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie <wenlei....@gmail.com >> <javascript:_e(%7B%7D,'cvml','wenlei....@gmail.com');>> wrote: >> >>> Hi, >>> >>> I am wondering the mechanism that determines the number of partitions >>> created by SparkContext.sequenceFile ? >>> >>> For example, although my file has only 4 splits, Spark would create 16 >>> partitions for it. Is it determined by the file size? Is there any way to >>> control it? (Looks like I can only tune minPartitions but not maxPartitions) >>> >>> Thank you! >>> >>> Best, >>> Wenlei >>> >>> >>> >> > > > -- > Wenlei Xie (谢文磊) > > Ph.D. Candidate > Department of Computer Science > 456 Gates Hall, Cornell University > Ithaca, NY 14853, USA > Email: wenlei....@gmail.com > <javascript:_e(%7B%7D,'cvml','wenlei....@gmail.com');> >