In most cases you want bigger splits because having lots of small tasks plays havoc on the job tracker. I have found that jobs with thousands of short lived map tasks tend to monopolize the slots. in other versions of hive the default was not CombineHiveInputFormat I think in most cases you want to make sure that is your default.
On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > set mapred.min.split.size=1024000; > set mapred.max.split.size=4096000; > set hive.merge.mapfiles=false; > > I had set above value and setting max split size to a lower value did > increase my # number of maps. My blocksize was 128MB > Only thing was my files on hdfs were not heavily compressed and I was > using RCFileFormat > > I would suggest if you have heavily compressed files then you may want to > do check what will be size after uncompression and allocate more memory to > maps > > > On Fri, Jan 25, 2013 at 11:46 AM, David Morel <dmore...@gmail.com> wrote: > >> Hello, >> >> I have seen many posts on various sites and MLs, but didn't find a firm >> answer anywhere: is it possible yes or no to force a smaller split size >> than a block on the mappers, from the client side? I'm not after >> pointers to the docs (unless you're very very sure :-) but after >> real-life experience along the lines of 'yes, it works this way, I've >> done it like this...' >> >> All the parameters that I could find (especially specifying a max input >> split size) seem to have no effect, and the files that I have are so >> heavily compressed that they completely saturate the mappers' memory >> when processed. >> >> A solution I could imagine for this specific issue is reducing the block >> size, but for now I simply went with disabling in-file compression for >> those. And changing the block size on a per-file basis is something I'd >> like to avoid if at all possible. >> >> All the hive settings that we tried only got me as far as raising the >> number of mappers from 5 to 6 (yay!) where I would have needed at least >> ten times more. >> >> Thanks! >> >> D.Morel >> > > > > -- > Nitin Pawar >