Maybe your tez container size is less than container of mr.  You can try 
increasing the container size of tez (You can check the container size of your 
job in RM WebUI)
e.g.   Set hive.tez.container.size=4096

BTW, not sure your cluster environment. I meet the similar issue under the 
cluster managed by ambari.  By default in ambari the hive container size is 
less than the container size of reducer in MR.
This may cause performance issue in the reduce stage when input data size is 
much large.



Best Regard,
Jeff Zhang


From: Juho Autio <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, October 23, 2015 at 4:08 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Constant Full GC making Tez Hive job take almost forever

Hi,

I'm running a Hive script with tez-0.7.0. The progress is real slow and in the 
container logs I'm seeing constant Full GC lines, so that there doesn't seem to 
be no time for the JVM to actually execute anything between the GC pauses.

When running the same Hive script with mr execution engine, the job goes 
through normally.

So there's something specific to Tez's memory usage that causes the Full GC 
issue.

Also with similar clusters & configuration other Hive jobs have gone through 
with Tez just fine. This issue happens when I just add a little more data to be 
processed by the script. With a smaller workload it goes through also with Tez 
engine with the expected execution time.

For example an extract from one of the container logs:

application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz

791.208: [Full GC
[PSYoungGen: 58368K->56830K(116736K)]
[ParOldGen: 348914K->348909K(349184K)]
407282K->405740K(465920K)
[PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22 
sys=0.04, real=1.40 secs]
Heap
 PSYoungGen      total 116736K, used 58000K [0x00000000f5500000, 
0x0000000100000000, 0x0000000100000000)
  eden space 58368K, 99% used 
[0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000)
  from space 58368K, 0% used 
[0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000)
  to   space 58368K, 0% used 
[0x00000000fc700000,0x00000000fc700000,0x0000000100000000)
 ParOldGen       total 349184K, used 348909K [0x00000000e0000000, 
0x00000000f5500000, 0x00000000f5500000)
  object space 349184K, 99% used 
[0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000)
 PSPermGen       total 43520K, used 43413K [0x00000000d5a00000, 
0x00000000d8480000, 0x00000000e0000000)
  object space 43520K, 99% used 
[0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000)

If I understand the GC log correctly, it seems like ParOldGen is full and Full 
GC doesn't manage to free space from there. So maybe Tez has created too many 
objects that can't be released. It could be a memory leak. Or maybe this is 
just not big enough minimum heap for Tez in general? I could probably fix the 
problem by changing configuration somehow to simply have less containers and 
thus bigger heap size per container? Still, changing to bigger nodes doesn't 
seem like a solution that would eventually scale, so I would prefer to resolve 
this properly.

Please, could you help me with how to troubleshoot & fix this issue?

Cheers,
Juho

Reply via email to