Hi Rajesh: Thanks for your reply. I want to know more detail , see inline
Sorry for that I don’t explain why I am so care about those counter. I am trying to analysis the data skew issue for tez vertex . Now I can get several related counter value including FILE_BYTES_READ, HDFS_BYTES_READ, SHUFFLE_BYTES and so on. So I want to know which counter value is meaningful for analyzing data skew ? Best wishes Joe zhang From: Rajesh Balamohan [mailto:[email protected]] Sent: Wednesday, July 8, 2015 4:57 PM To: [email protected] Cc: Xiaoyong Zhu; Yifung Lin Subject: Re: Tez Counter question FILE_BYTES_READ - Represents the data read from local disk >>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read >>>>>>>>>>from local disk or write to local disk ? I am wondering why reducer >>>>>>>>>>in tez has the data both read from local disk and shuffle from parent >>>>>>>>>>node, as far as I know, the traditional reducer in MR1 only read >>>>>>>>>>shuffle data(In memory and shuffle local disk), does tez engine did >>>>>>>>>>some optimizations for this ? HDFS_BYTES_READ - Represents data read from HDFS (does not include data read from disk) ;>>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read from hdfs or write tp hdfs? SHUFFLE_BYTES - Represents the data that was transferred over the wire while doing shuffle. Downloaded data either gets into memory or disk (depending on memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK would have correlation with SHUFFLE_BYTES. This does not have direct relationship with FILE_BYTES_READ. However, in case of spills & merge, FILES_BYTES_READ can be incremented correspondingly. ~Rajesh.B On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]<mailto:[email protected]>> wrote: HI Tez experts: Now I am using Tez Rest API to get tez tasks running Info, but I am confusing some concepts in Counter <1> For File system counters: counterName : FILE_BYTES_READ ? does it mean read from local disk or somewhere else ? HDFS_BYTES_READ ? is it included by FILE_BYTES_READ ? <2> For org.apache.tez.common.counters.TaskCounter: counterName SHUFFLE_BYTES ? does it have some relationship with FILE_BYTES_READ ? which data should be included in it ? Best wishes Joe zhang
