Regarding minimum number of partitions while reading data from Hadoop

2015-02-19 Thread twinkle sachdeva
Hi,

In our job, we need to process the data in small chunks, so  as to avoid GC
and other stuff. For this, we are using old API of hadoop as that let us
specify parameter like minPartitions.

Does any one knows, If  there a way to do the same via newHadoopAPI also?
How that way will be different from older API?

I am little bit aware of split size stuff, but not much aware regarding any
promise that minimum number of partitions criteria gets satisfied or not.

Any pointers will be of help.

Thanks,
Twinkle


Re: Regarding minimum number of partitions while reading data from Hadoop

2015-02-19 Thread Sean Owen
I think that the newer Hadoop API does not expose this suggested min
partitions parameter like the old one did. I believe you can try
setting mapreduce.input.fileinputformat.split.{min,max}size instead on
the Hadoop Configuration to suggest a max/min split size, and
therefore bound the number of partitions you get back.

On Thu, Feb 19, 2015 at 11:07 AM, twinkle sachdeva
twinkle.sachd...@gmail.com wrote:
 Hi,

 In our job, we need to process the data in small chunks, so  as to avoid GC
 and other stuff. For this, we are using old API of hadoop as that let us
 specify parameter like minPartitions.

 Does any one knows, If  there a way to do the same via newHadoopAPI also?
 How that way will be different from older API?

 I am little bit aware of split size stuff, but not much aware regarding any
 promise that minimum number of partitions criteria gets satisfied or not.

 Any pointers will be of help.

 Thanks,
 Twinkle

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org