Re: get input size for each task

Gopal V Sun, 06 Jul 2014 16:45:07 -0700

On 7/6/14, 3:22 PM, Grandl Robert wrote:

It is possible to know for a task/vertex what is the input size it needs to 
transfer from
each input task / vertex on every edge ? Similar, or the same for output ?


Yes.

  <property>
    <name>tez.task.generate.counters.per.io</name>
    <value>true</value>
  </property>
  <!-- ~4x counters due to per-io -->
  <property>
    <name>tez.runtime.job.counters.max</name>
    <value>4096</value>
  </property>

I know for each task/vertex you know the input/output vertices, but I could not 
find a way
to determine the input size on each edge to these vertices ?

If you are not on Hadoop-2.4.x and lacking an Application TimelineServer install, you can instead log the same stream to HDFS using


  <property>
    <name>tez.simple.history.logging.dir</name>
    <value>${fs.default.name}/user/gopal/tez-history/</value>
  </property>

this will log the JSON event stream to whichever HDFS directory you pick.

The default record separator is Ctrl+A ('\01').

The row marked DAG_FINISHED should have all the counters in it. Thatshould be all you need for counters.

I use the same data pulled off ATS to generate a Sankey diagram toanalyze slow JOINs.


http://people.apache.org/~gopalv/sankey/

https://gist.github.com/t3rmin4t0r/650d0f0fc9d0cf52b43e

Cheers,
Gopal

Re: get input size for each task

Reply via email to