Re: Hive map join - process a little larger tables with moderatenumber of rows

yongqiang he Thu, 31 Mar 2011 22:28:00 -0700

Can you try this one "hive.mapred.local.mem" (in MB)? It is to control
the heapsize of the join's local child process.
You can also try to increase the HADOOP_HEAPSIZE for your hive client.


But these all depends on how big is your small file.

thanks
yongqiang
On Thu, Mar 31, 2011 at 10:15 PM,  <bejoy...@yahoo.com> wrote:
> Thanks Yongqiang for your reply. I'm running a hive script which has nearly 
> 10 joins within. From those joins all map joins(9 of them involves one small 
> table) involving smaller tables are running fine. Just 1 join is on two 
> larger tables and this map join fails, however since the back up task(common 
> join) is executed successfully the whole hive job runs to completion 
> successfully.
>      In brief my hive job is running successfully now, but I just want to get 
> the failed map join as well running instead of the common join being 
> executed. I'm curious to see what would be the performance improvement out 
> there with this difference in execution.
>       To get a map join executed on larger tables do I have to for memory 
> parameters with hadoop?
> Since my entire task is already running to completion and I want get just a 
> map join working, shouldn't altering some hive map join parameters do my job?
> Please advise
>
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: yongqiang he <heyongqiang...@gmail.com>
> Date: Thu, 31 Mar 2011 16:25:03
> To: <user@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive map join - process a little larger tables with moderate
>  number of rows
>
> You possibly got a OOM error when processing the small tables. OOM is
> a fatal error that can not be controlled by the hive configs. So can
> you try to increase your memory setting?
>
> thanks
> yongqiang
> On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy...@yahoo.com> wrote:
>> Hi Experts
>>     I'm currently working with hive 0.7 mostly with JOINS. In all
>> permissible cases i'm using map joins by setting the
>> hive.auto.convert.join=true  parameter. Usage of local map joins have made a
>> considerable performance improvement in hive queries.I have used this local
>> map join only on the default set of hive configuration parameters now i'd
>> try to dig more deeper into this. Want to try out this local map join on
>> little bigger tables with more no of rows. Given below is a failure log of
>> one of my local map tasks and in turn executing its back up common join task
>>
>> 2011-03-31 09:56:54     Starting to launch local task to process map
>> join;      maximum memory = 932118528
>> 2011-03-31 09:56:57     Processing rows:        200000  Hashtable size:
>> 199999  Memory usage:   115481024       rate:   0.124
>> 2011-03-31 09:57:00     Processing rows:        300000  Hashtable size:
>> 299999  Memory usage:   169344064       rate:   0.182
>> 2011-03-31 09:57:03     Processing rows:        400000  Hashtable size:
>> 399999  Memory usage:   232132792       rate:   0.249
>> 2011-03-31 09:57:06     Processing rows:        500000  Hashtable size:
>> 499999  Memory usage:   282338544       rate:   0.303
>> 2011-03-31 09:57:10     Processing rows:        600000  Hashtable size:
>> 599999  Memory usage:   336738640       rate:   0.361
>> 2011-03-31 09:57:14     Processing rows:        700000  Hashtable size:
>> 699999  Memory usage:   391117888       rate:   0.42
>> 2011-03-31 09:57:22     Processing rows:        800000  Hashtable size:
>> 799999  Memory usage:   453906496       rate:   0.487
>> 2011-03-31 09:57:27     Processing rows:        900000  Hashtable size:
>> 899999  Memory usage:   508306552       rate:   0.545
>> 2011-03-31 09:57:34     Processing rows:        1000000 Hashtable size:
>> 999999  Memory usage:   562706496       rate:   0.604
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapredLocalTask
>> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
>> Launching Job 4 out of 6
>>
>>
>> Here i"d like to make this local map task running, for the same i tried
>> setting the following hive parameters as
>> hive -f  HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf
>> hive.mapjoin.smalltable.filesize=40000000 -hiveconf
>> hive.auto.convert.join=true
>> Butting setting the two config parameters doesn't make my local map task
>> proceed beyond this stage.  I didn't try out
>> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my
>> task log shows that the memory usage rate is just 0.604, so i assume setting
>> the same with a larger value wont cater to a solution in my case.Could some
>> one please guide me what are the actual parameters and the values I should
>> set to get things rolling.
>>
>> Thank You
>>
>> Regards
>> Bejoy.K.S
>>
>>
>

Re: Hive map join - process a little larger tables with moderatenumber of rows

Reply via email to