Can you try this one "hive.mapred.local.mem" (in MB)? It is to control the heapsize of the join's local child process. You can also try to increase the HADOOP_HEAPSIZE for your hive client.
But these all depends on how big is your small file. thanks yongqiang On Thu, Mar 31, 2011 at 10:15 PM, <bejoy...@yahoo.com> wrote: > Thanks Yongqiang for your reply. I'm running a hive script which has nearly > 10 joins within. From those joins all map joins(9 of them involves one small > table) involving smaller tables are running fine. Just 1 join is on two > larger tables and this map join fails, however since the back up task(common > join) is executed successfully the whole hive job runs to completion > successfully. > In brief my hive job is running successfully now, but I just want to get > the failed map join as well running instead of the common join being > executed. I'm curious to see what would be the performance improvement out > there with this difference in execution. > To get a map join executed on larger tables do I have to for memory > parameters with hadoop? > Since my entire task is already running to completion and I want get just a > map join working, shouldn't altering some hive map join parameters do my job? > Please advise > > > Regards > Bejoy K S > > -----Original Message----- > From: yongqiang he <heyongqiang...@gmail.com> > Date: Thu, 31 Mar 2011 16:25:03 > To: <user@hive.apache.org> > Reply-To: user@hive.apache.org > Subject: Re: Hive map join - process a little larger tables with moderate > number of rows > > You possibly got a OOM error when processing the small tables. OOM is > a fatal error that can not be controlled by the hive configs. So can > you try to increase your memory setting? > > thanks > yongqiang > On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy...@yahoo.com> wrote: >> Hi Experts >> I'm currently working with hive 0.7 mostly with JOINS. In all >> permissible cases i'm using map joins by setting the >> hive.auto.convert.join=true parameter. Usage of local map joins have made a >> considerable performance improvement in hive queries.I have used this local >> map join only on the default set of hive configuration parameters now i'd >> try to dig more deeper into this. Want to try out this local map join on >> little bigger tables with more no of rows. Given below is a failure log of >> one of my local map tasks and in turn executing its back up common join task >> >> 2011-03-31 09:56:54 Starting to launch local task to process map >> join; maximum memory = 932118528 >> 2011-03-31 09:56:57 Processing rows: 200000 Hashtable size: >> 199999 Memory usage: 115481024 rate: 0.124 >> 2011-03-31 09:57:00 Processing rows: 300000 Hashtable size: >> 299999 Memory usage: 169344064 rate: 0.182 >> 2011-03-31 09:57:03 Processing rows: 400000 Hashtable size: >> 399999 Memory usage: 232132792 rate: 0.249 >> 2011-03-31 09:57:06 Processing rows: 500000 Hashtable size: >> 499999 Memory usage: 282338544 rate: 0.303 >> 2011-03-31 09:57:10 Processing rows: 600000 Hashtable size: >> 599999 Memory usage: 336738640 rate: 0.361 >> 2011-03-31 09:57:14 Processing rows: 700000 Hashtable size: >> 699999 Memory usage: 391117888 rate: 0.42 >> 2011-03-31 09:57:22 Processing rows: 800000 Hashtable size: >> 799999 Memory usage: 453906496 rate: 0.487 >> 2011-03-31 09:57:27 Processing rows: 900000 Hashtable size: >> 899999 Memory usage: 508306552 rate: 0.545 >> 2011-03-31 09:57:34 Processing rows: 1000000 Hashtable size: >> 999999 Memory usage: 562706496 rate: 0.604 >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapredLocalTask >> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask >> Launching Job 4 out of 6 >> >> >> Here i"d like to make this local map task running, for the same i tried >> setting the following hive parameters as >> hive -f HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf >> hive.mapjoin.smalltable.filesize=40000000 -hiveconf >> hive.auto.convert.join=true >> Butting setting the two config parameters doesn't make my local map task >> proceed beyond this stage. I didn't try out >> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my >> task log shows that the memory usage rate is just 0.604, so i assume setting >> the same with a larger value wont cater to a solution in my case.Could some >> one please guide me what are the actual parameters and the values I should >> set to get things rolling. >> >> Thank You >> >> Regards >> Bejoy.K.S >> >> >