[jira] [Commented] (HIVE-27751) Log Query Compilation summary in an accumulated way

2024-01-22 Thread Ramesh Kumar Thangarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809764#comment-17809764
 ] 

Ramesh Kumar Thangarajan commented on HIVE-27751:
-

Hi [~zabetak] 

Thank you very much for reviewing this. I have updated the description with the 
sample output. 

Usually the debug logs are all spread across multiple places and we do not have 
a easy way to get the details from user when they run into performance issues. 
As part of this PR, main idea is to output the information in the command line 
output too. This will be done only if the config is turned on. That is what I 
meant by accumulated as we get all the details related to Query Compilation at 
one single place and its visible to the user as part of the query output.

Also I have addressed your comments, can you let me know what you think about 
the latest patch?

> Log Query Compilation summary in an accumulated way
> ---
>
> Key: HIVE-27751
> URL: https://issues.apache.org/jira/browse/HIVE-27751
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Query Compilation summary is very useful for reading and collecting all the 
> measures of compile time in a single place. It is also useful in debugging a 
> performance issue in the query compilation phase and also to report and 
> compare with various runs
> In order to run test this. Please set the config hive.compile.print.summary 
> to true in any q file and run the test to see the Query Compilation Summary 
> in the logs. One example of the output is below. The order of operations are 
> maintained while print the summary too:
> {code:java}
> Query Compilation Summary
> --
> waitCompile   
>0 ms
> parse 
>4 ms
> getTableConstraints - HS2-cache   
>   69 ms
> optimizer - Calcite: Plan generation  
>  257 ms
> optimizer - Calcite: Prejoin ordering transformation  
>   20 ms
> optimizer - Calcite: Postjoin ordering transformation 
>   24 ms
> optimizer 
>  705 ms
> optimizer - HiveOpConverterPostProc   
>0 ms
> optimizer - Generator 
>   24 ms
> optimizer - PartitionColumnsSeparator 
>1 ms
> optimizer - SyntheticJoinPredicate
>2 ms
> optimizer - SimplePredicatePushDown   
>8 ms
> optimizer - RedundantDynamicPruningConditionsRemoval  
>0 ms
> optimizer - SortedDynPartitionTimeGranularityOptimizer
>2 ms
> optimizer - PartitionPruner   
>3 ms
> optimizer - PartitionConditionRemover 
>2 ms
> optimizer - GroupByOptimizer  
>2 ms
> optimizer - ColumnPruner  
>   10 ms
> optimizer - CountDistinctRewriteProc  
>1 ms
> optimizer - SamplePruner  
>1 ms
> optimizer - MapJoinProcessor  
>2 ms
> optimizer - BucketingSortingReduceSinkOptimizer   
>2 ms
> optimizer - UnionProcessor
>2 ms
> optimizer - JoinReorder   
>0 ms
> optimizer - FixedBucketPruningOptimizer   
>2 ms
> optimizer - 

[jira] [Commented] (HIVE-27751) Log Query Compilation summary in an accumulated way

2024-01-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802175#comment-17802175
 ] 

Stamatis Zampetakis commented on HIVE-27751:


Thanks for working on this ticket [~rameshkumar]. Can you please address the 
questions/comments below?

It's a bit unclear what the query compilation summary will contain and how it 
is displayed. Please enrich the description of this JIRA ticket with some 
sample output from the new logs.

The HS2 logs already contain measurements from the compilation phase when the 
debug mode is enabled. Is the new summary redundant with respect to that or 
does it contain new information as well?

The JIRA summary mentions that logs are somehow "accumulated" but it is unclear 
to what accumulation refers to? Are we accumulating compilation times from 
multiple queries? What exactly do we accumulate?

> Log Query Compilation summary in an accumulated way
> ---
>
> Key: HIVE-27751
> URL: https://issues.apache.org/jira/browse/HIVE-27751
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Query Compilation summary is very useful for reading and collecting all the 
> measures of compile time in a single place. It is also useful in debugging a 
> performance issue in the query compilation phase and also to report and 
> compare with various runs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)