Thanks Yongqiang . I worked for me and I was able to evaluate the performance. It proved to be expensive :) Regards Bejoy K S
-----Original Message----- From: yongqiang he <heyongqiang...@gmail.com> Date: Thu, 31 Mar 2011 22:27:26 To: <user@hive.apache.org>; <bejoy...@yahoo.com> Reply-To: user@hive.apache.org Subject: Re: Hive map join - process a little larger tables with moderatenumber of rows Can you try this one "hive.mapred.local.mem" (in MB)? It is to control the heapsize of the join's local child process. You can also try to increase the HADOOP_HEAPSIZE for your hive client. But these all depends on how big is your small file. thanks yongqiang On Thu, Mar 31, 2011 at 10:15 PM, <bejoy...@yahoo.com> wrote: > Thanks Yongqiang for your reply. I'm running a hive script which has nearly > 10 joins within. From those joins all map joins(9 of them involves one small > table) involving smaller tables are running fine. Just 1 join is on two > larger tables and this map join fails, however since the back up task(common > join) is executed successfully the whole hive job runs to completion > successfully. > In brief my hive job is running successfully now, but I just want to get > the failed map join as well running instead of the common join being > executed. I'm curious to see what would be the performance improvement out > there with this difference in execution. > To get a map join executed on larger tables do I have to for memory > parameters with hadoop? > Since my entire task is already running to completion and I want get just a > map join working, shouldn't altering some hive map join parameters do my job? > Please advise > > > Regards > Bejoy K S > > -----Original Message----- > From: yongqiang he <heyongqiang...@gmail.com> > Date: Thu, 31 Mar 2011 16:25:03 > To: <user@hive.apache.org> > Reply-To: user@hive.apache.org > Subject: Re: Hive map join - process a little larger tables with moderate > number of rows > > You possibly got a OOM error when processing the small tables. OOM is > a fatal error that can not be controlled by the hive configs. So can > you try to increase your memory setting? > > thanks > yongqiang > On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy...@yahoo.com> wrote: >> Hi Experts >> I'm currently working with hive 0.7 mostly with JOINS. In all >> permissible cases i'm using map joins by setting the >> hive.auto.convert.join=true parameter. Usage of local map joins have made a >> considerable performance improvement in hive queries.I have used this local >> map join only on the default set of hive configuration parameters now i'd >> try to dig more deeper into this. Want to try out this local map join on >> little bigger tables with more no of rows. Given below is a failure log of >> one of my local map tasks and in turn executing its back up common join task >> >> 2011-03-31 09:56:54 Starting to launch local task to process map >> join; maximum memory = 932118528 >> 2011-03-31 09:56:57 Processing rows: 200000 Hashtable size: >> 199999 Memory usage: 115481024 rate: 0.124 >> 2011-03-31 09:57:00 Processing rows: 300000 Hashtable size: >> 299999 Memory usage: 169344064 rate: 0.182 >> 2011-03-31 09:57:03 Processing rows: 400000 Hashtable size: >> 399999 Memory usage: 232132792 rate: 0.249 >> 2011-03-31 09:57:06 Processing rows: 500000 Hashtable size: >> 499999 Memory usage: 282338544 rate: 0.303 >> 2011-03-31 09:57:10 Processing rows: 600000 Hashtable size: >> 599999 Memory usage: 336738640 rate: 0.361 >> 2011-03-31 09:57:14 Processing rows: 700000 Hashtable size: >> 699999 Memory usage: 391117888 rate: 0.42 >> 2011-03-31 09:57:22 Processing rows: 800000 Hashtable size: >> 799999 Memory usage: 453906496 rate: 0.487 >> 2011-03-31 09:57:27 Processing rows: 900000 Hashtable size: >> 899999 Memory usage: 508306552 rate: 0.545 >> 2011-03-31 09:57:34 Processing rows: 1000000 Hashtable size: >> 999999 Memory usage: 562706496 rate: 0.604 >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapredLocalTask >> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask >> Launching Job 4 out of 6 >> >> >> Here i"d like to make this local map task running, for the same i tried >> setting the following hive parameters as >> hive -f HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf >> hive.mapjoin.smalltable.filesize=40000000 -hiveconf >> hive.auto.convert.join=true >> Butting setting the two config parameters doesn't make my local map task >> proceed beyond this stage. I didn't try out >> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my >> task log shows that the memory usage rate is just 0.604, so i assume setting >> the same with a larger value wont cater to a solution in my case.Could some >> one please guide me what are the actual parameters and the values I should >> set to get things rolling. >> >> Thank You >> >> Regards >> Bejoy.K.S >> >> >