Hive, Yarn and Tez seem so intertwined I felt like this would be a good
place to get an answer. But please correct me if I am wrong.

I am writing data to a table in Hive from another Hive table of size
6.29GB(block size: 128MB) using Yarn as the resource manager and Tez as the
execution engine. The yarn settings are.

 yarn.nodemanager.resource.memory-mb=8192
 yarn.scheduler.minimum-allocation-mb=1024
 yarn.scheduler.maximum-allocation-mb=8192

After the Insert query ends, the logs show for a single yarn container the
below information

{
 counterGroupName:org.apache.tez.common.counters.FileSystemCounter,
 counterGroupDisplayName:File System Counters,
 counters:[{ counterName:HDFS_BYTES_READ, counterValue:536887296},
 { counterName:HDFS_BYTES_WRITTEN, counterValue:107265498 },
 { counterName:HDFS_READ_OPS, counterValue:7 },
 { counterName:HDFS_WRITE_OPS, counterValue:3 } ]
}
{
counterGroupName:org.apache.tez.common.counters.TaskCounter,
counters:[{counterName:GC_TIME_MILLIS,counterValue:5450},
{counterName:CPU_MILLISECONDS,counterValue:97670},
{counterName:PHYSICAL_MEMORY_BYTES,counterValue:166723584},
{counterName:VIRTUAL_MEMORY_BYTES,counterValue:1968496640},
{counterName:COMMITTED_HEAP_BYTES,counterValue:166723584},
{counterName:INPUT_RECORDS_PROCESSED,counterValue:1736321},
{counterName:INPUT_SPLIT_LENGTH_BYTES,counterValue:536870912}]
}

How does Tez decide how much data to read from hdfs (HDFS_BYTES_READ). How
did it decide to read 537 MB for a single yarn container even though the
container is set to use 1 GB memory?

Physical memory used = 167 MB
Virtual memory used = 1969 MB
HDFS_BYTES_READ = 537 MB

Reply via email to