Constant Full GC making Tez Hive job take almost forever

Juho Autio Fri, 23 Oct 2015 01:09:07 -0700

Hi,

I'm running a Hive script with tez-0.7.0. The progress is real slow and in
the container logs I'm seeing constant Full GC lines, so that there doesn't
seem to be no time for the JVM to actually execute anything between the GC
pauses.


When running the same Hive script with mr execution engine, the job goes
through normally.

So there's something specific to Tez's memory usage that causes the Full GC
issue.

Also with similar clusters & configuration other Hive jobs have gone
through with Tez just fine. This issue happens when I just add a little
more data to be processed by the script. With a smaller workload it goes
through also with Tez engine with the expected execution time.

For example an extract from one of the container logs:

application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz

791.208: [Full GC
[PSYoungGen: 58368K->56830K(116736K)]
[ParOldGen: 348914K->348909K(349184K)]
407282K->405740K(465920K)
[PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22
sys=0.04, real=1.40 secs]
Heap
 PSYoungGen      total 116736K, used 58000K [0x00000000f5500000,
0x0000000100000000, 0x0000000100000000)
  eden space 58368K, 99% used
[0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000)
  from space 58368K, 0% used
[0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000)
  to   space 58368K, 0% used
[0x00000000fc700000,0x00000000fc700000,0x0000000100000000)
 ParOldGen       total 349184K, used 348909K [0x00000000e0000000,
0x00000000f5500000, 0x00000000f5500000)
  object space 349184K, 99% used
[0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000)
 PSPermGen       total 43520K, used 43413K [0x00000000d5a00000,
0x00000000d8480000, 0x00000000e0000000)
  object space 43520K, 99% used
[0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000)

If I understand the GC log correctly, it seems like ParOldGen is full and
Full GC doesn't manage to free space from there. So maybe Tez has created
too many objects that can't be released. It could be a memory leak. Or
maybe this is just not big enough minimum heap for Tez in general? I could
probably fix the problem by changing configuration somehow to simply have
less containers and thus bigger heap size per container? Still, changing to
bigger nodes doesn't seem like a solution that would eventually scale, so I
would prefer to resolve this properly.

Please, could you help me with how to troubleshoot & fix this issue?

Cheers,
Juho

Constant Full GC making Tez Hive job take almost forever

Reply via email to