Hi guys, I'm having some troubles with Tez when I try to load some data stored in small JSON files in HDFS into a Hive table.
At first I got some Out of memory exceptions, so I tried increasing the amount of memory allocated to Tez, until the problem turned to a GC Overhead limit exceeded after 10 GB of RAM was allocated to Tez containers. So I upgraded my common sense and put back memory limits to a normal level, and now the problem I hit is the following : INFO : Map 1: 276(+63,-84)/339 INFO : Map 1: 276(+63,-85)/339 INFO : Map 1: 276(+63,-85)/339 INFO : Map 1: 276(+0,-86)/339 INFO : Map 1: 276(+0,-86)/339 ERROR : Status: Failed ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1499426430661_0113_1_00, diagnostics=[Task failed, taskId=task_1499426430661_0113_1_00_000241, diagnostics=[TaskAttempt 0 failed, info=[Container container_e17_1499426430661_0113_01_000170 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=59528,containerID=container_e17_1499426430661_0113_01_000170] is running beyond physical memory limits. Current usage: 2.7 GB of 2.5 GB physical memory used; 4.4 GB of 5.3 GB virtual memory used. Killing container. The problem is I can't see how the container could be allocated so much memory, and why can't Tez split the jobs into smaller ones when it fails for memory reasons. FYI, in YARN, Max container memory is 92160 MB, in MR2 Map can have 4GB and Reduce 5GB, Tez container size is set to 2560 MB and tez.grouping.max-size is set to 1073741824. If you need more information feel free to ask. I am currently running out of ideas on how to debug this as I have a limited access to Tez container logs, so any inputs will be highly appreciated. Thanks ! Loïc Loïc CHANEL System Big Data engineer MS&T - Worldline Analytics Platform - Worldline (Villeurbanne, France)