Sorry. That answer was not correct.
Hive uses the following settings to determine the memory size (and correspondingly the java-opts for tez tasks). The default is to not use these configs and use the settings for Maps from map reduce configuration. So your reducer is running with 4GB. You can set these tez specific configs to increase the memory to 8GB and appropriate java opts. You can use the java opts settings from reduce java opts. This will also end up increasing the size of the container for the “Map” tasks but will allow the “Reduce” tasks to reuse those containers for many kinds of performance benefits. <property> <name>hive.tez.container.size</name> <value>-1</value> <description>By default tez will spawn containers of the size of a mapper. This can be used to overwrite.</description> </property> <property> <name>hive.tez.java.opts</name> <value></value> <description>By default tez will use the java opts from map tasks. This can be used to overwrite.</description> </property> *From:* Bikas Saha [mailto:[email protected]] *Sent:* Friday, May 23, 2014 6:04 PM *To:* [email protected]; user *Subject:* RE: Hive on Tez: Diagnosing query execution issues If you are running Hive on Tez then the following settings will be respected. So all vertices with Map in their name will use mapreduce.map.memory.mb and all vertices with Reduce in their name will use mapreduce.reduce.memory.mb. Bikas *From:* Pala M Muthaia [mailto:[email protected]] *Sent:* Friday, May 23, 2014 5:54 PM *To:* [email protected]; user *Subject:* Re: Hive on Tez: Diagnosing query execution issues Adding the right hive users alias. On Fri, May 23, 2014 at 5:52 PM, Pala M Muthaia <[email protected]> wrote: Hi, I am trying to run a relatively heavy Hive query that joins 3 tables. The query succeeds on MR after increasing the mapper and reducer container memory: set mapreduce.map.memory.mb=4096; set mapreduce.reduce.memory.mb=8192; However, the same query, with same settings, on Tez, seems to get stuck in Reducer 2. (The query is a join between 3 tables, hence has 3 Map and 2 reduce nodes in the DAG). By stuck, i mean i see only the following in the container logs, for a long time: 2014-05-23 19:08:54,729 INFO [AMRM Callback Handler Thread] org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: 0 taskAllocations: 301 I need help with the following 2 questions: 1. Is there a separate setting for tez, to specify the amount of memory for a container, equivalent to the *.memory.mb settings for mapreduce? Maybe that value needs to be updated. 2. I already looked at the logs on the AM, and i only see the above log statements. How do i get more information on why the Reduce node in the query DAG is not progressing? Can i get more info from the reduce task logs? How do i determine the machines on which the reduce tasks were scheduled, so that i can look up the task logs, if any? The yarn resource manager UI doesn't show such information. When I changed the amount of data to one of the large tables by introducing sampling, and the query succeeded. I am suspecting memory issue, but i am not sure how much memory was allocated in the first place. Thanks. -pala -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
