Re: Regarding minimum number of partitions while reading data from Hadoop

Sean Owen Thu, 19 Feb 2015 04:07:17 -0800

I think that the newer Hadoop API does not expose this suggested min
partitions parameter like the old one did. I believe you can try
setting mapreduce.input.fileinputformat.split.{min,max}size instead on
the Hadoop Configuration to suggest a max/min split size, and
therefore bound the number of partitions you get back.


On Thu, Feb 19, 2015 at 11:07 AM, twinkle sachdeva
<twinkle.sachd...@gmail.com> wrote:
> Hi,
>
> In our job, we need to process the data in small chunks, so  as to avoid GC
> and other stuff. For this, we are using old API of hadoop as that let us
> specify parameter like minPartitions.
>
> Does any one knows, If  there a way to do the same via newHadoopAPI also?
> How that way will be different from older API?
>
> I am little bit aware of split size stuff, but not much aware regarding any
> promise that minimum number of partitions criteria gets satisfied or not.
>
> Any pointers will be of help.
>
> Thanks,
> Twinkle

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Regarding minimum number of partitions while reading data from Hadoop

Reply via email to