Hi,

I checked the number of partitions by

System.out.println("INFO: RDD with " + rdd.partitions().size() + "
partitions created.");


Each single split is about 100MB. I am currently loading the data from
local file system, would this explains this observation?

Thank you!

Best,
Wenlei

On Tue, Apr 21, 2015 at 6:28 AM, Archit Thakur <archit279tha...@gmail.com>
wrote:

> Hi,
>
> It should generate the same no of partitions as the no. of splits.
> Howd you check no of partitions.? Also please paste your file size and
> hdfs-site.xml and mapred-site.xml here.
>
> Thanks and Regards,
> Archit Thakur.
>
> On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie <wenlei....@gmail.com> wrote:
>
>> Hi,
>>
>> I am wondering the mechanism that determines the number of partitions
>> created by SparkContext.sequenceFile ?
>>
>> For example, although my file has only 4 splits, Spark would create 16
>> partitions for it. Is it determined by the file size? Is there any way to
>> control it? (Looks like I can only tune minPartitions but not maxPartitions)
>>
>> Thank you!
>>
>> Best,
>> Wenlei
>>
>>
>>
>


-- 
Wenlei Xie (谢文磊)

Ph.D. Candidate
Department of Computer Science
456 Gates Hall, Cornell University
Ithaca, NY 14853, USA
Email: wenlei....@gmail.com

Reply via email to