Not sure if you have Tez-UI which should render this info automatically. Otherwise you can verify from the application logs. Example is given below.
2015-04-27 04:15:12,834 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1429683757595_0452_1][Event:DAG_FINISHED]: dagId=dag_1429683757595_0452_1, startTime=1430133293306, finishTime=1430133312773, timeTaken=19467, status=SUCCEEDED, diagnostics=, counters=Counters: 225, org.apache.tez.common.counters.DAGCounter, NUM_SUCCEEDED_TASKS=43, TOTAL_LAUNCHED_TASKS=43, ..... ..... *TaskCounter_Map_4_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=137200, OUTPUT_BYTES_PHYSICAL=119705, OUTPUT_BYTES_WITH_OVERHEAD=548794, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=27440, SPILLED_RECORDS=0, TaskCounter_Map_5_INPUT_date_dim, INPUT_RECORDS_PROCESSED=10, TaskCounter_Map_5_OUTPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=1825, OUTPUT_BYTES_PHYSICAL=1505, OUTPUT_BYTES_WITH_OVERHEAD=7297, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=365, SPILLED_RECORDS=0, ....... ....... *TaskCounter_Map_6_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=909, OUTPUT_BYTES_PHYSICAL=464, OUTPUT_BYTES_WITH_OVERHEAD=2421, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=101, SPILLED_RECORDS=0, TaskCounter_Map_7_INPUT_item, INPUT_RECORDS_PROCESSED=47, .... .... *TaskCounter_Map_7_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=1104000, OUTPUT_BYTES_PHYSICAL=341828, OUTPUT_BYTES_WITH_OVERHEAD=1727999, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=48000, SPILLED_RECORDS=0, TaskCounter_Reducer_2_INPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=821473, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, COMBINE_INPUT_RECORDS=0, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=5049, MERGED_MAP_OUTPUTS=36, MERGE_PHASE_TIME=5070, NUM_DISK_TO_DISK_MERGES=0, NUM_FAILED_SHUFFLE_INPUTS=0, NUM_MEM_TO_DISK_MERGES=0, NUM_SHUFFLED_INPUTS=36, NUM_SKIPPED_INPUTS=0, REDUCE_INPUT_GROUPS=47999, REDUCE_INPUT_RECORDS=670353, SHUFFLE_BYTES=16402510, SHUFFLE_BYTES_DECOMPRESSED=52252736, SHUFFLE_BYTES_DISK_DIRECT=821473, SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=15581037, SHUFFLE_PHASE_TIME=5056, SPILLED_RECORDS=33317, .... .... *TaskCounter_Reducer_2_OUTPUT_Reducer_3*, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=5600, OUTPUT_BYTES_PHYSICAL=0, OUTPUT_BYTES_WITH_OVERHEAD=0, OUTPUT_RECORDS=100, SPILLED_RECORDS=100 On Tue, Apr 28, 2015 at 4:41 AM, Xiaoyong Zhu <[email protected]> wrote: > Btw, Rajesh, I set tez.task.generate.counters.per.io=true in my cluster > but did not find the task counter per edge. Could you please give some > counter examples when this is enabled so I could verify? > > > > Thanks! > > > > Xiaoyong > > > > *From:* Rajesh Balamohan [mailto:[email protected]] > *Sent:* Friday, April 24, 2015 4:55 PM > *To:* [email protected] > *Subject:* Re: How to Tuning Tez Task Performance > > > > Listing some details at very high level, > > > > - Set "tez.task.generate.counters.per.io=true" to get more details on the > task counters. Basically this starts printinng the counters per edge, which > can be a lot more useful for debugging. > > > > - In case you want to avoid container launches etc when you analyze for > first time, try hive.prewarm.enabled=true & hive.prewarm.numcontainers=<no > of containers you want in your sesssion to be prewarmed> > > > > - Container reuse is enabled by default in tez. > (tez.am.container.idle.release-timeout-min.millis, > tez.am.container.idle.release-timeout-max.millis controls the amount of > time a container is held by AM before releasing it) > > > > - Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check > task counters in the logs to find out the spills and adjust it accordingly) > > > > - Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot > performant than DefaultSorter (this is the default in master branch. But if > you are using earlier releases, you can turn it on by setting > tez.runtime.sort.threads=2). > > > > - Set tez.runtime.compress=true and set tez.runtime.compress.codec > (SnappyCodec is preferred, but it is upto you to choose) > > > > - Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle > heavy workload. This reduces number of connections in shuffle. > > > > - Adjust memory allocated to different inputs/outputs based on > tez.task.scale.memory.ratios (but this is more of expert level setting > which you might want to touch after nailing down any memory pressure) > > > > - Adjusting shuffle buffers are also possible, but would advise only when > you nail down an issue related to shuffle/merge codepath. > > > > - Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when > data is locally present) > > > > > > Feel free to refer to > https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for > any commonly used settings for benchmarks. > > > > On Fri, Apr 24, 2015 at 1:52 PM, [email protected] <[email protected]> > wrote: > > I want to Tuning Tez Task Performance. This Tez Task is created by > Hive. How to Tuning Tez Task Performance? > > Analyze performance by Tez Task Counts of Tez Log ? Any Suggestion? > > > ------------------------------ > > [email protected] > > > > > > -- > > ~Rajesh.B > -- ~Rajesh.B
