RE: Hive on Tez: Diagnosing query execution issues

Bikas Saha Fri, 23 May 2014 18:26:31 -0700

Sorry. That answer was not correct.



Hive uses the following settings to determine the memory size (and
correspondingly the java-opts for tez tasks). The default is to not use
these configs and use the settings for Maps from map reduce configuration.
So your reducer is running with 4GB. You can set these tez specific configs
to increase the memory to 8GB and appropriate java opts. You can use the
java opts settings from reduce java opts. This will also end up increasing
the size of the container for the “Map” tasks but will allow the “Reduce”
tasks to reuse those containers for many kinds of performance benefits.



<property>

  <name>hive.tez.container.size</name>

  <value>-1</value>

  <description>By default tez will spawn containers of the size of a
mapper. This can be used to overwrite.</description>

</property>



<property>

  <name>hive.tez.java.opts</name>

  <value></value>

  <description>By default tez will use the java opts from map tasks. This
can be used to overwrite.</description>

</property>





*From:* Bikas Saha [mailto:[email protected]]
*Sent:* Friday, May 23, 2014 6:04 PM
*To:* [email protected]; user
*Subject:* RE: Hive on Tez: Diagnosing query execution issues



If you are running Hive on Tez then the following settings will be
respected. So all vertices with Map in their name will use
mapreduce.map.memory.mb and all vertices with Reduce in their name will use
mapreduce.reduce.memory.mb.



Bikas



*From:* Pala M Muthaia [mailto:[email protected]]
*Sent:* Friday, May 23, 2014 5:54 PM
*To:* [email protected]; user
*Subject:* Re: Hive on Tez: Diagnosing query execution issues



Adding the right hive users alias.



On Fri, May 23, 2014 at 5:52 PM, Pala M Muthaia <[email protected]>
wrote:

Hi,



I am trying to run a relatively heavy Hive query that joins 3 tables. The
query succeeds on MR after increasing the mapper and reducer container
memory:



set mapreduce.map.memory.mb=4096;

set mapreduce.reduce.memory.mb=8192;



However, the same query, with same settings, on Tez, seems to get stuck in
Reducer 2.  (The query is a join between 3 tables, hence has 3 Map and 2
reduce nodes in the DAG).



By stuck, i mean i see only the following in the container logs, for a long
time:

2014-05-23 19:08:54,729 INFO [AMRM Callback Handler Thread]
org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu:
0 taskAllocations: 301





I need help with the following 2 questions:



1. Is there a separate setting for tez, to specify the amount of memory for
a container, equivalent to the *.memory.mb settings for mapreduce? Maybe
that value needs to be updated.



2. I already looked at the logs on the AM, and i only see the above log
statements. How do i get more information on why the Reduce node in the
query DAG is not progressing? Can i get more info from the reduce task
logs? How do i determine the machines on which the reduce tasks were
scheduled, so that i can look up the task logs, if any? The yarn resource
manager UI doesn't show such information.



When I changed the amount of data to one of the large tables by introducing
sampling, and the query succeeded. I am suspecting memory issue, but i am
not sure how much memory was allocated in the first place.





Thanks.

-pala

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Hive on Tez: Diagnosing query execution issues

Reply via email to