Hello Weidong Bian

Did you see the following configuration properties in conf directory


<property>
  <name>mapred.reduce.tasks</name>
  <value>-1</value>
    <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas
hive uses -1 as its default value.
  By setting this property to -1, Hive will automatically figure out what
should be the number of reducers.
  </description>
</property>


<property>
  <name>hive.exec.reducers.max</name>
  <value>999</value>
  <description>max number of reducers will be used. If the one
    specified in the configuration parameter mapred.reduce.tasks is
    negative, hive will use this one as the max number of reducers when
    automatically determine number of reducers.</description>
</property>

Thanks and Regards

Jagat


On Tue, Mar 13, 2012 at 9:54 PM, Bruce Bian <[email protected]> wrote:

> Hi there,
> when I'm using Hive to doing a query as follows, 6 Map/Reduce jobs are
> launched, one for each join, and it deals with ~460M data in ~950 seconds,
> which I think is way toooo slow for a cluster with 5 slaves and 24GB
> memory/12 disks each.
>
> set mapred.reduce.tasks=5;
> SELECT a.*,e.code_name as is_internet_flg, f.code_name as
> wb_access_tp_desc, g.code_name as free_tp_desc,
> b.acnt_no,b.addr_id,b.postcode,b.acnt_rmnd_tp,b.print_tp,b.media_type,
> c.cust_code,c.root_cust_code,
>
> d.mdf_name,d.sub_bureau_code,d.bureau_cd,d.adm_sub_bureau_name,d.bureau_name
> FROM prc_idap_pi_root a
>  LEFT OUTER JOIN idap_pi_root_acnt b ON a.acnt_id=b.acnt_id
>  LEFT OUTER JOIN idap_pi_root_cust c ON a.cust_id=c.cust_id
>  LEFT OUTER JOIN ocrm_vt_area d ON a.dev_area_id=d.area_id
>  LEFT OUTER JOIN osor_code e ON a.data_internet_flg=e.code_val and
> e.code_tp='IS_INTERNET_FLG'
>  LEFT OUTER JOIN osor_code f ON a.wb_access_tp=f.code_val and
> f.code_tp='WEB_ACCESS_TP'
>  LEFT OUTER JOIN osor_code g ON a.free_tp=g.code_val and
> g.code_tp='FREE_TP';
>
> For each jobs, most of the time is consumed by the reduce jobs. As the
> idap_pi_root is very large, to scan over it for 6 times is quite
> inefficient. Is it possible to reduce the map/reduce jobs to only one?
>
> Thanks,
> Weidong Bian
>

Reply via email to