So - if we want to know if a vertex has data skew issue or not, which counter number should we use?
Xiaoyong -----Original Message----- From: Hitesh Shah [mailto:[email protected]] Sent: Thursday, July 9, 2015 1:39 PM To: [email protected] Cc: Xiaoyong Zhu; Yifung Lin; Zhaomin Xu Subject: Re: Tez Counter question For data skew, you may also want to consider enabling "tez.task.generate.counters.per.io". This enables counters on a per edge basis which is more helpful for complex DAGs. - Hitesh On Jul 8, 2015, at 10:29 PM, Joe Zhang (SDE) <[email protected]> wrote: > Hi Rajesh: > > Thanks for your reply. I want to know more detail , see inline > > Sorry for that I don't explain why I am so care about those counter. I am > trying to analysis the data skew issue for tez vertex . Now I can get several > related counter value including FILE_BYTES_READ, HDFS_BYTES_READ, > SHUFFLE_BYTES and so on. So I want to know which counter value is meaningful > for analyzing data skew ? > > Best wishes > Joe zhang > > From: Rajesh Balamohan [mailto:[email protected]] > Sent: Wednesday, July 8, 2015 4:57 PM > To: [email protected] > Cc: Xiaoyong Zhu; Yifung Lin > Subject: Re: Tez Counter question > > FILE_BYTES_READ - Represents the data read from local disk > >>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read > >>>>>>>>>>from local disk or write to local disk ? I am wondering why reducer > >>>>>>>>>>in tez has the data both read from local disk and shuffle from > >>>>>>>>>>parent node, as far as I know, the traditional reducer in MR1 only > >>>>>>>>>>read shuffle data(In memory and shuffle local disk), does tez > >>>>>>>>>>engine did some optimizations for this ? > > HDFS_BYTES_READ - Represents data read from HDFS (does not include > data read from disk) ;>>>>>>>>>>Joezhang : when or in which case mapper or > reducer vertex need read from hdfs or write tp hdfs? > > SHUFFLE_BYTES - Represents the data that was transferred over the wire while > doing shuffle. Downloaded data either gets into memory or disk (depending on > memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK > would have correlation with SHUFFLE_BYTES. This does not have direct > relationship with FILE_BYTES_READ. However, in case of spills & merge, > FILES_BYTES_READ can be incremented correspondingly. > > ~Rajesh.B > > On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]> wrote: > HI Tez experts: > > Now I am using Tez Rest API to get tez tasks running Info, but I am > confusing some concepts in Counter > > <1> For File system counters: > > counterName : FILE_BYTES_READ ? does it mean read from local disk or > somewhere else ? > > HDFS_BYTES_READ ? is it included by > FILE_BYTES_READ ? > > <2> For org.apache.tez.common.counters.TaskCounter: > > counterName SHUFFLE_BYTES ? does it have some relationship with > FILE_BYTES_READ ? which data should be included in it ? > > Best wishes > Joe zhang
