question about number of map tasks for small file

Junxian Yan Tue, 31 May 2011 02:56:08 -0700

Hi Guys

I use flume to store log file , and use hive to query.


Flume always store the small file with suffix .seq Now I have over 35
thousand seq files. Every time when I launch query script, 35 thousand map
tasks will be created and it's so long time to wait for completing.

I also try to set CombineHiveInputFormat, but if I set this option, it seems
the task will be executed slowly. Because total size of the data folder over
700M.  Now in my testing env, I only have 3 data nodes. I also tried to add
mapred.map.tasks=5 after the CombineHiveInputFormat setting, seems doesn't
work. There's alway only one map task if set CombineHiveInputFormat.

Can you plz show me a solution in which I can set map task number freely

BTW: version for hadoop is 20 and hive is 0.5

Richard

question about number of map tasks for small file

Reply via email to