Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized in a block-less filesystem... And am thinking about application tier ways to simulate blocks - i.e. by making the granularity of partitions smaller.
Wondering, if there is a way to hack an increased numbers of partitions as a mechanism to simulate blocks - or wether this is just a bad idea altogether :) On Tue, Apr 30, 2013 at 2:56 PM, Mohammad Tariq <[email protected]> wrote: > Hello Jay, > > What are you going to do in your custom InputFormat and partitioner?Is > your InputFormat is going to create larger splits which will overlap with > larger blocks?If that is the case, IMHO, then you are going to reduce the > no. of mappers thus reducing the parallelism. Also, much larger block size > will put extra overhead when it comes to disk I/O. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <[email protected]> wrote: > >> Hi guys: >> >> Im wondering - if I'm running mapreduce jobs on a cluster with large >> block sizes - can i increase performance with either: >> >> 1) A custom FileInputFormat >> >> 2) A custom partitioner >> >> 3) -DnumReducers >> >> Clearly, (3) will be an issue due to the fact that it might overload >> tasks and network traffic... but maybe (1) or (2) will be a precise way to >> "use" partitions as a "poor mans" block. >> >> Just a thought - not sure if anyone has tried (1) or (2) before in order >> to simulate blocks and increase locality by utilizing the partition API. >> >> -- >> Jay Vyas >> http://jayunit100.blogspot.com >> > > -- Jay Vyas http://jayunit100.blogspot.com
