Re: Real-life experience of forcing smaller input splits?

David Morel Fri, 25 Jan 2013 01:53:52 -0800

On 25 Jan 2013, at 10:37, Bertrand Dechoux wrote:

> It seems to me the question has not been answered :
> "is it possible yes or no to force a smaller split size
> than a block on the mappers"
>
> Not that I know (but you could implement something to do it) but why would
> you do it?
> By default if the split is set under the size of a block, it will be a
> block.
> One of the reason is data-locality. The second is that a block is written
> into a single hard-drive (leaving replicas aside) so if n mappers were
> reading n parts from the same block well they would share the hard-drive
> bandwidth... So it is not a clear win.
>
> You can change the block size of the file you want to read but using
> smaller block size is really an anti-pattern. Most people increase the
> block size.
> (Note : block size of files are fixed when writing the files and it can be
> different between two different files.)
>
> Are you trying to handle data which are too small?
> If hive supports multi-threading for mapper it might be an solution. But I
> don't the configuration for that.


The files are RCFiles with a block size of 128MB IIRC, but the file
compression achieves a ratio of nearly 1 to 100. When going through the
mapper, there is simply not enough memory available to it. Since the
compression scheme is BLOCK, I expected it would be possible to instruct
hive to process only a limited number of fragments instead of everything
that's in the file in 1 go.

David

Re: Real-life experience of forcing smaller input splits?

Reply via email to