Re: How to measure the execution time of query on Hive on Tez
Hi There are couple of tables that you are seeing, the top one is Query Execution Summary and next one is Task Execution Summary. At the end of your query execution, you should see something like "Time taken: xx.xx seconds” in hive CLI or beeline. This represents the overall time the query took from start (start of semantic analysis) to finish (end of fetching results). This overall time is broken down in the Query Execution Summary table. When you submit the query, query goes through multiple phases before returning the results. The time for each of these phases is defined in Query Execution Summary tables. If you want the query runtime then you should look at the last line in hive CLI or beeline which represents the end-to-end time. If you are just interested in how long the query actually executed in the cluster then you should look at Run DAG time. Run DAG is NOT sum of duration of all vertices as vertices can be scheduled in parallel. Run DAG is the time that Hive computes when it knows that DAG has been accepted and started by Application Master until the completion of DAG (could be failed, errored or succeeded). Hope this helps. Thanks Prasanth > On Oct 12, 2017, at 2:46 PM, Zhang, Liyunwrote: > > Hi all: > Maybe in last mail the attached picture is not shown. > I re-described my question here. I saw following statistics about the > runtime when running query. > > The Run DAG is 318s. But it is not the sum of DURATION of all > VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s > Not the sum of CPU_TIME. There are several indicator "RUN > DAG","DURATION","CPU_TIME", which indicator I should use when measure the > performance? Sometimes I found there is significant improvement in sum of > (CPU_TIME) while there is no significant improvement in "RUN DAG". Is this > normal? Appreciate to get some feedback from you! > > > > 2017-10-12T16:29:39,262 INFO [main] SessionState: > -- > 2017-10-12T16:29:39,262 INFO [main] SessionState: OPERATION > DURATION > 2017-10-12T16:29:39,262 INFO [main] SessionState: > -- > 2017-10-12T16:29:39,263 INFO [main] SessionState: Compile Query > 3.72s > 2017-10-12T16:29:39,263 INFO [main] SessionState: Prepare Plan > 0.60s > 2017-10-12T16:29:39,263 INFO [main] SessionState: Submit Plan > 0.61s > 2017-10-12T16:29:39,263 INFO [main] SessionState: Start DAG > 0.52s > 2017-10-12T16:29:39,263 INFO [main] SessionState: Run DAG >318.54s > 2017-10-12T16:29:39,263 INFO [main] SessionState: > -- > 2017-10-12T16:29:39,263 INFO [main] SessionState: > 2017-10-12T16:29:39,289 INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] > counters.Limits: Counter limits initialized with parameters: > GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000 > 2017-10-12T16:29:39,294 INFO [main] SessionState: Task Execution Summary > 2017-10-12T16:29:39,294 INFO [main] SessionState: > -- > 2017-10-12T16:29:39,294 INFO [main] SessionState: VERTICES > DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS > 2017-10-12T16:29:39,294 INFO [main] SessionState: > -- > 2017-10-12T16:29:39,298 INFO [main] SessionState: Map 1 > 59549.00 1,355,520 28,565 550,076,5541,602,119,842 > 2017-10-12T16:29:39,300 INFO [main] SessionState: Map 12 > 4069.00 15,670522 73,049 732 > 2017-10-12T16:29:39,300 INFO [main] SessionState: Map 13 > 3055.00 14,030567 212 212 > 2017-10-12T16:29:39,301 INFO [main] SessionState: Map 14 > 3055.00 13,820606 212 212 > 2017-10-12T16:29:39,303 INFO [main] SessionState: Reducer 10 > 1004.00 13,450265 44 > 2017-10-12T16:29:39,305 INFO [main] SessionState: Reducer 11 > 1006.00 4,290 71 216 212 > 2017-10-12T16:29:39,307 INFO [main] SessionState: Reducer 2 > 132736.00 2,362,160 83,029 537,120,745 107,740,258 > 2017-10-12T16:29:39,308 INFO [main] SessionState: Reducer 3 > 34248.00643,350 20,661
RE: How to measure the execution time of query on Hive on Tez
Hi all: Maybe in last mail the attached picture is not shown. I re-described my question here. I saw following statistics about the runtime when running query. The Run DAG is 318s. But it is not the sum of DURATION of all VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s Not the sum of CPU_TIME. There are several indicator "RUN DAG","DURATION","CPU_TIME", which indicator I should use when measure the performance? Sometimes I found there is significant improvement in sum of (CPU_TIME) while there is no significant improvement in "RUN DAG". Is this normal? Appreciate to get some feedback from you! 2017-10-12T16:29:39,262 INFO [main] SessionState: -- 2017-10-12T16:29:39,262 INFO [main] SessionState: OPERATION DURATION 2017-10-12T16:29:39,262 INFO [main] SessionState: -- 2017-10-12T16:29:39,263 INFO [main] SessionState: Compile Query 3.72s 2017-10-12T16:29:39,263 INFO [main] SessionState: Prepare Plan 0.60s 2017-10-12T16:29:39,263 INFO [main] SessionState: Submit Plan 0.61s 2017-10-12T16:29:39,263 INFO [main] SessionState: Start DAG 0.52s 2017-10-12T16:29:39,263 INFO [main] SessionState: Run DAG 318.54s 2017-10-12T16:29:39,263 INFO [main] SessionState: -- 2017-10-12T16:29:39,263 INFO [main] SessionState: 2017-10-12T16:29:39,289 INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000 2017-10-12T16:29:39,294 INFO [main] SessionState: Task Execution Summary 2017-10-12T16:29:39,294 INFO [main] SessionState: -- 2017-10-12T16:29:39,294 INFO [main] SessionState: VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS 2017-10-12T16:29:39,294 INFO [main] SessionState: -- 2017-10-12T16:29:39,298 INFO [main] SessionState: Map 1 59549.00 1,355,520 28,565 550,076,5541,602,119,842 2017-10-12T16:29:39,300 INFO [main] SessionState: Map 12 4069.00 15,670522 73,049 732 2017-10-12T16:29:39,300 INFO [main] SessionState: Map 13 3055.00 14,030567 212 212 2017-10-12T16:29:39,301 INFO [main] SessionState: Map 14 3055.00 13,820606 212 212 2017-10-12T16:29:39,303 INFO [main] SessionState: Reducer 10 1004.00 13,450265 44 2017-10-12T16:29:39,305 INFO [main] SessionState: Reducer 11 1006.00 4,290 71 216 212 2017-10-12T16:29:39,307 INFO [main] SessionState: Reducer 2 132736.00 2,362,160 83,029 537,120,745 107,740,258 2017-10-12T16:29:39,308 INFO [main] SessionState: Reducer 3 34248.00 643,350 20,661 107,740,470 203 2017-10-12T16:29:39,310 INFO [main] SessionState: Reducer 4 11077.00 77,020 1,496 203 31 2017-10-12T16:29:39,311 INFO [main] SessionState: Reducer 5 1003.00 40,030824 10 10 2017-10-12T16:29:39,312 INFO [main] SessionState: Reducer 6439.00 590 0 100 2017-10-12T16:29:39,314 INFO [main] SessionState: Reducer 7 140896.00 1,925,760 52,784 537,120,745 107,740,258 2017-10-12T16:29:39,316 INFO [main] SessionState: Reducer 8 35260.00 590,200 22,331 107,740,470 76 2017-10-12T16:29:39,318 INFO [main] SessionState: Reducer 9 8070.00 24,630249 764 2017-10-12T16:29:39,318 INFO [main] SessionState: --- From: Zhang, Liyun [mailto:liyun.zh...@intel.com] Sent: Thursday, October 12, 2017 4:40 PM To: dev@hive.apache.org Subject: How to measure the execution time of query on Hive on Tez Hi all: Anyone knows how to view the detail execution time of every map/reduce task in hive on tez? I screenshot the result: Run DAG is 324.s . But this is not the sum of DURATION time of every
RE: How to measure the execution time of query on Hive on Tez
Hi all: I know that some map tasks can be executed parallel, for example, Map1 and Map10 are running together like, so maybe we can not sum the DURATION of every tasks as execution time. 15:07:18,917 INFO [e1890b6a-a9fe-4e0d-8bd7-167603079db1 main] monitoring.RenderStrategy$LogToFileFunction: Map 1: 23(+23)/46 Map 10: 23(+23)/46 Map 15: 1/1 Map 16: 1/1 Map 7: 1/1 Map 8: 1/1 Reducer 11: 127/127 Reducer 12: 0/35Reducer 13: Best Regards Kelly Zhang/Zhang,Liyun From: Zhang, Liyun Sent: Thursday, October 12, 2017 4:40 PM To: 'dev@hive.apache.org'Subject: How to measure the execution time of query on Hive on Tez Hi all: Anyone knows how to view the detail execution time of every map/reduce task in hive on tez? I screenshot the result: Run DAG is 324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1). So which parameter DURATION(ms) or CPU_TIME(ms) should be used? [cid:image001.png@01D3437B.97FA0660] Appreciate to get some feedback from you! Best Regards Kelly Zhang/Zhang,Liyun