Re: Number of input partitions in SparkContext.sequenceFile

2015-05-02 Thread Archit Thakur
Hi, How did u check no of splits in ur file. Did i run ur mr job or calculated it.? The formula for split size is max(minSize, min(max size, block size)). Can u check if it satisfies ur case.? Thanks Regards, Archit Thakur. On Saturday, April 25, 2015, Wenlei Xie wenlei@gmail.com wrote:

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-24 Thread Wenlei Xie
Hi, I checked the number of partitions by System.out.println(INFO: RDD with + rdd.partitions().size() + partitions created.); Each single split is about 100MB. I am currently loading the data from local file system, would this explains this observation? Thank you! Best, Wenlei On Tue, Apr

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-21 Thread Archit Thakur
Hi, It should generate the same no of partitions as the no. of splits. Howd you check no of partitions.? Also please paste your file size and hdfs-site.xml and mapred-site.xml here. Thanks and Regards, Archit Thakur. On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie wenlei@gmail.com wrote: Hi,

Number of input partitions in SparkContext.sequenceFile

2015-04-18 Thread Wenlei Xie
Hi, I am wondering the mechanism that determines the number of partitions created by SparkContext.sequenceFile ? For example, although my file has only 4 splits, Spark would create 16 partitions for it. Is it determined by the file size? Is there any way to control it? (Looks like I can only