Hi guys,

I'm having some troubles with Tez when I try to load some data stored in
small JSON files in HDFS into a Hive table.

At first I got some Out of memory exceptions, so I tried increasing the
amount of memory allocated to Tez, until the problem turned to a GC
Overhead limit exceeded after 10 GB of RAM was allocated to Tez containers.

So I upgraded my common sense and put back memory limits to a normal level,
and now the problem I hit is the following :

INFO  : Map 1: 276(+63,-84)/339
INFO  : Map 1: 276(+63,-85)/339
INFO  : Map 1: 276(+63,-85)/339
INFO  : Map 1: 276(+0,-86)/339
INFO  : Map 1: 276(+0,-86)/339
ERROR : Status: Failed
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1,
vertexId=vertex_1499426430661_0113_1_00, diagnostics=[Task failed,
taskId=task_1499426430661_0113_1_00_000241, diagnostics=[TaskAttempt 0
failed, info=[Container container_e17_1499426430661_0113_01_000170 finished
with diagnostics set to [Container failed, exitCode=-104. Container
[pid=59528,containerID=container_e17_1499426430661_0113_01_000170] is
running beyond physical memory limits. Current usage: 2.7 GB of 2.5 GB
physical memory used; 4.4 GB of 5.3 GB virtual memory used. Killing
container.

The problem is I can't see how the container could be allocated so much
memory, and why can't Tez split the jobs into smaller ones when it fails
for memory reasons.

FYI, in YARN, Max container memory is 92160 MB, in MR2 Map can have 4GB and
Reduce 5GB, Tez container size is set to 2560 MB and tez.grouping.max-size is
set to 1073741824.

If you need more information feel free to ask.

I am currently running out of ideas on how to debug this as I have a
limited access to Tez container logs, so any inputs will be highly
appreciated.

Thanks !


Loïc

Loïc CHANEL
System Big Data engineer
MS&T - Worldline Analytics Platform - Worldline (Villeurbanne, France)

Reply via email to