Correct. In case processor chooses to read some additional data from HDFS (as a part of some processing), that would also be account for in HDFS_BYTES_READ.
~Rajesh.B On Wed, Jul 8, 2015 at 2:34 PM, Xiaoyong Zhu <[email protected]> wrote: > Thanks Rajesh. Then what’s the total amount of data for a certain task > or vertex to read, if we want to count? HDFS_BYTES_READ(which indicates the > data read from “global file system”) + SHUFFLE_BYTES (which indicates the > data read from “upper task/vertex”)? > > > > Xiaoyong > > > > *From:* Rajesh Balamohan [mailto:[email protected]] > *Sent:* Wednesday, July 8, 2015 4:57 PM > *To:* [email protected] > *Cc:* Xiaoyong Zhu; Yifung Lin > *Subject:* Re: Tez Counter question > > > > FILE_BYTES_READ - Represents the data read from local disk > > > > HDFS_BYTES_READ - Represents data read from HDFS (does not include data > read from disk) > > > > SHUFFLE_BYTES - Represents the data that was transferred over the wire > while doing shuffle. Downloaded data either gets into memory or disk > (depending on memory availability). So, SHUFFLE_BYTES_TO_MEM and > SHUFFLE_BYTES_TO_DISK would have correlation with SHUFFLE_BYTES. This does > not have direct relationship with FILE_BYTES_READ. However, in case of > spills & merge, FILES_BYTES_READ can be incremented correspondingly. > > > > ~Rajesh.B > > > > On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]> > wrote: > > HI Tez experts: > > > > Now I am using Tez Rest API to get tez tasks running Info, but I am > confusing some concepts in Counter > > > > <1> For File system counters: > > > > counterName : FILE_BYTES_READ ? does it mean read from local disk or > somewhere else ? > > > > HDFS_BYTES_READ ? is it included by > FILE_BYTES_READ ? > > > > <2> For org.apache.tez.common.counters.TaskCounter: > > > > counterName SHUFFLE_BYTES ? does it have some relationship with > FILE_BYTES_READ ? which data should be included in it ? > > > > Best wishes > > Joe zhang > > > > >
