subject:"Number of input partitions in SparkContext.sequenceFile"

Re: Number of input partitions in SparkContext.sequenceFile

2015-05-02 Thread Archit Thakur

Hi, How did u check no of splits in ur file. Did i run ur mr job or calculated it.? The formula for split size is max(minSize, min(max size, block size)). Can u check if it satisfies ur case.? Thanks & Regards, Archit Thakur. On Saturday, April 25, 2015, Wenlei Xie wrote: > Hi, > > I checked

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-24 Thread Wenlei Xie

Hi, I checked the number of partitions by System.out.println("INFO: RDD with " + rdd.partitions().size() + " partitions created."); Each single split is about 100MB. I am currently loading the data from local file system, would this explains this observation? Thank you! Best, Wenlei On Tue,

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-21 Thread Archit Thakur

Hi, It should generate the same no of partitions as the no. of splits. Howd you check no of partitions.? Also please paste your file size and hdfs-site.xml and mapred-site.xml here. Thanks and Regards, Archit Thakur. On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie wrote: > Hi, > > I am wondering t

Number of input partitions in SparkContext.sequenceFile

2015-04-18 Thread Wenlei Xie

Hi, I am wondering the mechanism that determines the number of partitions created by SparkContext.sequenceFile ? For example, although my file has only 4 splits, Spark would create 16 partitions for it. Is it determined by the file size? Is there any way to control it? (Looks like I can only tune