Hi,

How did u check no of splits in ur file. Did i run ur mr job or calculated
it.?
 The formula for split size is
max(minSize, min(max size, block size)). Can u check if it satisfies ur
case.?

Thanks & Regards,
Archit Thakur.

On Saturday, April 25, 2015, Wenlei Xie <wenlei....@gmail.com> wrote:

> Hi,
>
> I checked the number of partitions by
>
> System.out.println("INFO: RDD with " + rdd.partitions().size() + "
> partitions created.");
>
>
> Each single split is about 100MB. I am currently loading the data from
> local file system, would this explains this observation?
>
> Thank you!
>
> Best,
> Wenlei
>
> On Tue, Apr 21, 2015 at 6:28 AM, Archit Thakur <archit279tha...@gmail.com
> <javascript:_e(%7B%7D,'cvml','archit279tha...@gmail.com');>> wrote:
>
>> Hi,
>>
>> It should generate the same no of partitions as the no. of splits.
>> Howd you check no of partitions.? Also please paste your file size and
>> hdfs-site.xml and mapred-site.xml here.
>>
>> Thanks and Regards,
>> Archit Thakur.
>>
>> On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie <wenlei....@gmail.com
>> <javascript:_e(%7B%7D,'cvml','wenlei....@gmail.com');>> wrote:
>>
>>> Hi,
>>>
>>> I am wondering the mechanism that determines the number of partitions
>>> created by SparkContext.sequenceFile ?
>>>
>>> For example, although my file has only 4 splits, Spark would create 16
>>> partitions for it. Is it determined by the file size? Is there any way to
>>> control it? (Looks like I can only tune minPartitions but not maxPartitions)
>>>
>>> Thank you!
>>>
>>> Best,
>>> Wenlei
>>>
>>>
>>>
>>
>
>
> --
> Wenlei Xie (谢文磊)
>
> Ph.D. Candidate
> Department of Computer Science
> 456 Gates Hall, Cornell University
> Ithaca, NY 14853, USA
> Email: wenlei....@gmail.com
> <javascript:_e(%7B%7D,'cvml','wenlei....@gmail.com');>
>

Reply via email to