Re: Hive map join - process a little larger tables withmoderatenumber of rows

bejoy_ks Fri, 01 Apr 2011 07:09:44 -0700

Thanks Yongqiang . I worked for me and I was able to evaluate the performance. 
It proved to be expensive :) 
Regards
Bejoy K S


-----Original Message-----
From: yongqiang he <heyongqiang...@gmail.com>
Date: Thu, 31 Mar 2011 22:27:26 
To: <user@hive.apache.org>; <bejoy...@yahoo.com>
Reply-To: user@hive.apache.org
Subject: Re: Hive map join - process a little larger tables with
 moderatenumber of rows

Can you try this one "hive.mapred.local.mem" (in MB)? It is to control
the heapsize of the join's local child process.
You can also try to increase the HADOOP_HEAPSIZE for your hive client.

But these all depends on how big is your small file.

thanks
yongqiang
On Thu, Mar 31, 2011 at 10:15 PM,  <bejoy...@yahoo.com> wrote:
> Thanks Yongqiang for your reply. I'm running a hive script which has nearly 
> 10 joins within. From those joins all map joins(9 of them involves one small 
> table) involving smaller tables are running fine. Just 1 join is on two 
> larger tables and this map join fails, however since the back up task(common 
> join) is executed successfully the whole hive job runs to completion 
> successfully.
>      In brief my hive job is running successfully now, but I just want to get 
> the failed map join as well running instead of the common join being 
> executed. I'm curious to see what would be the performance improvement out 
> there with this difference in execution.
>       To get a map join executed on larger tables do I have to for memory 
> parameters with hadoop?
> Since my entire task is already running to completion and I want get just a 
> map join working, shouldn't altering some hive map join parameters do my job?
> Please advise
>
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: yongqiang he <heyongqiang...@gmail.com>
> Date: Thu, 31 Mar 2011 16:25:03
> To: <user@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Re: Hive map join - process a little larger tables with moderate
>  number of rows
>
> You possibly got a OOM error when processing the small tables. OOM is
> a fatal error that can not be controlled by the hive configs. So can
> you try to increase your memory setting?
>
> thanks
> yongqiang
> On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy...@yahoo.com> wrote:
>> Hi Experts
>>     I'm currently working with hive 0.7 mostly with JOINS. In all
>> permissible cases i'm using map joins by setting the
>> hive.auto.convert.join=true  parameter. Usage of local map joins have made a
>> considerable performance improvement in hive queries.I have used this local
>> map join only on the default set of hive configuration parameters now i'd
>> try to dig more deeper into this. Want to try out this local map join on
>> little bigger tables with more no of rows. Given below is a failure log of
>> one of my local map tasks and in turn executing its back up common join task
>>
>> 2011-03-31 09:56:54     Starting to launch local task to process map
>> join;      maximum memory = 932118528
>> 2011-03-31 09:56:57     Processing rows:        200000  Hashtable size:
>> 199999  Memory usage:   115481024       rate:   0.124
>> 2011-03-31 09:57:00     Processing rows:        300000  Hashtable size:
>> 299999  Memory usage:   169344064       rate:   0.182
>> 2011-03-31 09:57:03     Processing rows:        400000  Hashtable size:
>> 399999  Memory usage:   232132792       rate:   0.249
>> 2011-03-31 09:57:06     Processing rows:        500000  Hashtable size:
>> 499999  Memory usage:   282338544       rate:   0.303
>> 2011-03-31 09:57:10     Processing rows:        600000  Hashtable size:
>> 599999  Memory usage:   336738640       rate:   0.361
>> 2011-03-31 09:57:14     Processing rows:        700000  Hashtable size:
>> 699999  Memory usage:   391117888       rate:   0.42
>> 2011-03-31 09:57:22     Processing rows:        800000  Hashtable size:
>> 799999  Memory usage:   453906496       rate:   0.487
>> 2011-03-31 09:57:27     Processing rows:        900000  Hashtable size:
>> 899999  Memory usage:   508306552       rate:   0.545
>> 2011-03-31 09:57:34     Processing rows:        1000000 Hashtable size:
>> 999999  Memory usage:   562706496       rate:   0.604
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapredLocalTask
>> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
>> Launching Job 4 out of 6
>>
>>
>> Here i"d like to make this local map task running, for the same i tried
>> setting the following hive parameters as
>> hive -f  HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf
>> hive.mapjoin.smalltable.filesize=40000000 -hiveconf
>> hive.auto.convert.join=true
>> Butting setting the two config parameters doesn't make my local map task
>> proceed beyond this stage.  I didn't try out
>> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my
>> task log shows that the memory usage rate is just 0.604, so i assume setting
>> the same with a larger value wont cater to a solution in my case.Could some
>> one please guide me what are the actual parameters and the values I should
>> set to get things rolling.
>>
>> Thank You
>>
>> Regards
>> Bejoy.K.S
>>
>>
>

Re: Hive map join - process a little larger tables withmoderatenumber of rows

Reply via email to