Re: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Prasanth Jayachandran
Hi

There are couple of tables that you are seeing, the top one is Query Execution 
Summary and next one is Task Execution Summary.

At the end of your query execution, you should see something like "Time taken: 
xx.xx seconds” in hive CLI or beeline. This represents the overall time the 
query took from start (start of semantic analysis) to finish (end of fetching 
results).

This overall time is broken down in the Query Execution Summary table. When you 
submit the query, query goes through multiple phases before returning the 
results. 
The time for each of these phases is defined in Query Execution Summary tables. 

If you want the query runtime then you should look at the last line in hive CLI 
or beeline which represents the end-to-end time.
If you are just interested in how long the query actually executed in the 
cluster then you should look at Run DAG time.

Run DAG is NOT sum of duration of all vertices as vertices can be scheduled in 
parallel. 
Run DAG is the time that Hive computes when it knows that DAG has been accepted 
and started by Application Master until the completion of DAG (could be failed, 
errored or succeeded).

Hope this helps.

Thanks
Prasanth

> On Oct 12, 2017, at 2:46 PM, Zhang, Liyun  wrote:
> 
> Hi all:
>   Maybe in last mail the attached picture is not shown.
> I re-described my question here.  I saw following statistics about the 
> runtime when running query.
> 
> The Run DAG is 318s.  But it is not the sum of DURATION of all 
> VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
> Not the sum of CPU_TIME.   There are several indicator "RUN 
> DAG","DURATION","CPU_TIME",  which indicator I should use when measure the 
> performance? Sometimes I found there is significant improvement in sum of 
> (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this 
> normal?  Appreciate to get some feedback from you!
> 
> 
> 
> 2017-10-12T16:29:39,262  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION  
>   DURATION
> 2017-10-12T16:29:39,262  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query  
>  3.72s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan   
>  0.60s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan
>  0.61s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG  
>  0.52s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG
>318.54s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,263  INFO [main] SessionState:
> 2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] 
> counters.Limits: Counter limits initialized with parameters:  
> GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
> 2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
> 2017-10-12T16:29:39,294  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES  
> DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
> 2017-10-12T16:29:39,294  INFO [main] SessionState: 
> --
> 2017-10-12T16:29:39,298  INFO [main] SessionState:  Map 1  
> 59549.00  1,355,520 28,565 550,076,5541,602,119,842
> 2017-10-12T16:29:39,300  INFO [main] SessionState: Map 12   
> 4069.00 15,670522  73,049  732
> 2017-10-12T16:29:39,300  INFO [main] SessionState: Map 13   
> 3055.00 14,030567 212  212
> 2017-10-12T16:29:39,301  INFO [main] SessionState: Map 14   
> 3055.00 13,820606 212  212
> 2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10   
> 1004.00 13,450265   44
> 2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11   
> 1006.00  4,290 71 216  212
> 2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2 
> 132736.00  2,362,160 83,029 537,120,745  107,740,258
> 2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3  
> 34248.00643,350 20,661 

RE: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Zhang, Liyun
Hi all:
   Maybe in last mail the attached picture is not shown.
I re-described my question here.  I saw following statistics about the runtime 
when running query.

The Run DAG is 318s.  But it is not the sum of DURATION of all 
VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
Not the sum of CPU_TIME.   There are several indicator "RUN 
DAG","DURATION","CPU_TIME",  which indicator I should use when measure the 
performance? Sometimes I found there is significant improvement in sum of 
(CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this 
normal?  Appreciate to get some feedback from you!



2017-10-12T16:29:39,262  INFO [main] SessionState: 
--
2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION
DURATION
2017-10-12T16:29:39,262  INFO [main] SessionState: 
--
2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query
   3.72s
2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan 
   0.60s
2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan  
   0.61s
2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG
   0.52s
2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG  
 318.54s
2017-10-12T16:29:39,263  INFO [main] SessionState: 
--
2017-10-12T16:29:39,263  INFO [main] SessionState:
2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] 
counters.Limits: Counter limits initialized with parameters:  
GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
2017-10-12T16:29:39,294  INFO [main] SessionState: 
--
2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES  DURATION(ms) 
  CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
2017-10-12T16:29:39,294  INFO [main] SessionState: 
--
2017-10-12T16:29:39,298  INFO [main] SessionState:  Map 1  59549.00 
 1,355,520 28,565 550,076,5541,602,119,842
2017-10-12T16:29:39,300  INFO [main] SessionState: Map 12   4069.00 
15,670522  73,049  732
2017-10-12T16:29:39,300  INFO [main] SessionState: Map 13   3055.00 
14,030567 212  212
2017-10-12T16:29:39,301  INFO [main] SessionState: Map 14   3055.00 
13,820606 212  212
2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10   1004.00 
13,450265   44
2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11   1006.00 
 4,290 71 216  212
2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2 132736.00 
 2,362,160 83,029 537,120,745  107,740,258
2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3  34248.00 
   643,350 20,661 107,740,470  203
2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4  11077.00 
77,020  1,496 203   31
2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5   1003.00 
40,030824  10   10
2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6439.00 
   590  0  100
2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7 140896.00 
 1,925,760 52,784 537,120,745  107,740,258
2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8  35260.00 
   590,200 22,331 107,740,470   76
2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9   8070.00 
24,630249  764
2017-10-12T16:29:39,318  INFO [main] SessionState: 
---

From: Zhang, Liyun [mailto:liyun.zh...@intel.com]
Sent: Thursday, October 12, 2017 4:40 PM
To: dev@hive.apache.org
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task 
in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every 

RE: How to measure the execution time of query on Hive on Tez

2017-10-12 Thread Zhang, Liyun
Hi all:
I know that some map tasks can be executed parallel, for example, Map1 and 
Map10 are running together like, so maybe we can not sum the DURATION of every 
tasks as execution time.

15:07:18,917  INFO [e1890b6a-a9fe-4e0d-8bd7-167603079db1 main] 
monitoring.RenderStrategy$LogToFileFunction: Map 1: 23(+23)/46 Map 10: 
23(+23)/46   Map 15: 1/1 Map 16: 1/1 Map 7: 1/1  Map 8: 1/1 
 Reducer 11: 127/127 Reducer 12: 0/35Reducer 13:



Best Regards
Kelly Zhang/Zhang,Liyun




From: Zhang, Liyun
Sent: Thursday, October 12, 2017 4:40 PM
To: 'dev@hive.apache.org' 
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task 
in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 
665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) 
or CPU_TIME(ms) should be used?
[cid:image001.png@01D3437B.97FA0660]

Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun