[jira] [Updated] (HIVE-23488) Optimise PartitionManagementTask::Msck::repair

2020-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23488:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> Optimise PartitionManagementTask::Msck::repair
> --
>
> Key: HIVE-23488
> URL: https://issues.apache.org/jira/browse/HIVE-23488
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23488.1.patch, Screenshot 2020-05-18 at 5.06.15 
> AM.png
>
>
> Ends up fetching table information twice.
> !Screenshot 2020-05-18 at 5.06.15 AM.png|width=1084,height=754!
>  
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L113]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L234]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23487) Optimise PartitionManagementTask

2020-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23487:

Attachment: HIVE-23487.1.patch

> Optimise PartitionManagementTask
> 
>
> Key: HIVE-23487
> URL: https://issues.apache.org/jira/browse/HIVE-23487
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23487.1.patch, Screenshot 2020-05-18 at 4.19.48 
> AM.png
>
>
> Msck.init for every table takes more time than the actual table repair. This 
> was observed on a system which had lots of DB and tables.
>  
>   !Screenshot 2020-05-18 at 4.19.48 AM.png|width=1014,height=732!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23487) Optimise PartitionManagementTask

2020-05-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23487:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> Optimise PartitionManagementTask
> 
>
> Key: HIVE-23487
> URL: https://issues.apache.org/jira/browse/HIVE-23487
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23487.1.patch, Screenshot 2020-05-18 at 4.19.48 
> AM.png
>
>
> Msck.init for every table takes more time than the actual table repair. This 
> was observed on a system which had lots of DB and tables.
>  
>   !Screenshot 2020-05-18 at 4.19.48 AM.png|width=1014,height=732!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23468) LLAP: Optimise OrcEncodedDataReader to avoid FS init to NN

2020-05-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23468:

Attachment: HIVE-23468.1.patch

> LLAP: Optimise OrcEncodedDataReader to avoid FS init to NN
> --
>
> Key: HIVE-23468
> URL: https://issues.apache.org/jira/browse/HIVE-23468
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23468.1.patch
>
>
> OrcEncodedDataReader materializes the supplier to check if it is a HDFS 
> system or not. This causes unwanted call to NN even in cases when cache is 
> completely warmed up.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L540]
> [https://github.com/apache/hive/blob/9f40d7cc1d889aa3079f3f494cf810fabe326e44/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L107]
> Workaround is to set "hive.llap.io.use.fileid.path=false" to avoid this case.
> IO elevator could get 100% cache hit from FileSystem impl in warmed up 
> scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23468) LLAP: Optimise OrcEncodedDataReader to avoid FS init to NN

2020-05-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23468:

Status: Patch Available  (was: Open)

> LLAP: Optimise OrcEncodedDataReader to avoid FS init to NN
> --
>
> Key: HIVE-23468
> URL: https://issues.apache.org/jira/browse/HIVE-23468
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23468.1.patch
>
>
> OrcEncodedDataReader materializes the supplier to check if it is a HDFS 
> system or not. This causes unwanted call to NN even in cases when cache is 
> completely warmed up.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L540]
> [https://github.com/apache/hive/blob/9f40d7cc1d889aa3079f3f494cf810fabe326e44/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L107]
> Workaround is to set "hive.llap.io.use.fileid.path=false" to avoid this case.
> IO elevator could get 100% cache hit from FileSystem impl in warmed up 
> scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23446) LLAP: Reduce IPC connection misses to AM for short queries

2020-05-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23446:

Attachment: HIVE-23446.3.patch

> LLAP: Reduce IPC connection misses to AM for short queries
> --
>
> Key: HIVE-23446
> URL: https://issues.apache.org/jira/browse/HIVE-23446
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23446.1.patch, HIVE-23446.2.patch, 
> HIVE-23446.3.patch
>
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java#L343]
>  
> Umbilical UGI pool for is maintained at QueryInfo level. When there are lots 
> of short queries, this ends up missing IPC cache and ends up recreating 
> threads/connections to the same AM.
> It would be good to maintain this pool in {{ContainerRunnerImpl}} instead and 
> recycle as needed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23446) LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23446:

Attachment: HIVE-23446.2.patch

> LLAP: Reduce IPC connection misses to AM for short queries
> --
>
> Key: HIVE-23446
> URL: https://issues.apache.org/jira/browse/HIVE-23446
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23446.1.patch, HIVE-23446.2.patch
>
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java#L343]
>  
> Umbilical UGI pool for is maintained at QueryInfo level. When there are lots 
> of short queries, this ends up missing IPC cache and ends up recreating 
> threads/connections to the same AM.
> It would be good to maintain this pool in {{ContainerRunnerImpl}} instead and 
> recycle as needed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23451) FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file

2020-05-12 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105941#comment-17105941
 ] 

Rajesh Balamohan commented on HIVE-23451:
-

I had the same confusion on this codepath on the treatment for index "0" and .1 
patch followed the approach of reducing duplicate invocation.

Looking at the codepath more reveals that {{totalFiles}} is set to 1 by default 
in semanticAnalyzer. In case of bucketing, it ends up setting to number of 
buckets.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6942]
 (set to number of buckets)

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6787]
 (sets to 1 by default).

"0" index handling isn't needed actually. I will attach .2 version shortly.

 

> FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file
> ---
>
> Key: HIVE-23451
> URL: https://issues.apache.org/jira/browse/HIVE-23451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23451.1.patch, HIVE-23451.2.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L826]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L797]
> Can avoid a NN call here (i.e, mainly for small queries).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23451) FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23451:

Attachment: HIVE-23451.2.patch

> FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file
> ---
>
> Key: HIVE-23451
> URL: https://issues.apache.org/jira/browse/HIVE-23451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23451.1.patch, HIVE-23451.2.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L826]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L797]
> Can avoid a NN call here (i.e, mainly for small queries).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105932#comment-17105932
 ] 

Rajesh Balamohan commented on HIVE-23449:
-

.3 is same as .2 with minor comment added.

Thanks for the review [~ashutoshc] .

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, HIVE-23449.2.patch, 
> HIVE-23449.3.patch, Screenshot 2020-05-12 at 1.09.35 PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23449:

Attachment: HIVE-23449.3.patch

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, HIVE-23449.2.patch, 
> HIVE-23449.3.patch, Screenshot 2020-05-12 at 1.09.35 PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23449:

Attachment: HIVE-23449.2.patch

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, HIVE-23449.2.patch, Screenshot 
> 2020-05-12 at 1.09.35 PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105776#comment-17105776
 ] 

Rajesh Balamohan commented on HIVE-23449:
-

RB: https://reviews.apache.org/r/72503/diff/1#index_header

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, Screenshot 2020-05-12 at 1.09.35 
> PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23446) LLAP: Reduce IPC connection misses to AM for short queries

2020-05-12 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105362#comment-17105362
 ] 

Rajesh Balamohan commented on HIVE-23446:
-

RB link: https://reviews.apache.org/r/72499/diff/1#index_header

> LLAP: Reduce IPC connection misses to AM for short queries
> --
>
> Key: HIVE-23446
> URL: https://issues.apache.org/jira/browse/HIVE-23446
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23446.1.patch
>
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java#L343]
>  
> Umbilical UGI pool for is maintained at QueryInfo level. When there are lots 
> of short queries, this ends up missing IPC cache and ends up recreating 
> threads/connections to the same AM.
> It would be good to maintain this pool in {{ContainerRunnerImpl}} instead and 
> recycle as needed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23446) LLAP: Reduce IPC connection misses to AM for short queries

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23446:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> LLAP: Reduce IPC connection misses to AM for short queries
> --
>
> Key: HIVE-23446
> URL: https://issues.apache.org/jira/browse/HIVE-23446
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23446.1.patch
>
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java#L343]
>  
> Umbilical UGI pool for is maintained at QueryInfo level. When there are lots 
> of short queries, this ends up missing IPC cache and ends up recreating 
> threads/connections to the same AM.
> It would be good to maintain this pool in {{ContainerRunnerImpl}} instead and 
> recycle as needed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23446) LLAP: Reduce IPC connection misses to AM for short queries

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23446:

Attachment: HIVE-23446.1.patch

> LLAP: Reduce IPC connection misses to AM for short queries
> --
>
> Key: HIVE-23446
> URL: https://issues.apache.org/jira/browse/HIVE-23446
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23446.1.patch
>
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java#L343]
>  
> Umbilical UGI pool for is maintained at QueryInfo level. When there are lots 
> of short queries, this ends up missing IPC cache and ends up recreating 
> threads/connections to the same AM.
> It would be good to maintain this pool in {{ContainerRunnerImpl}} instead and 
> recycle as needed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23449:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, Screenshot 2020-05-12 at 1.09.35 
> PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23449) LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23449:

Attachment: HIVE-23449.1.patch

> LLAP: Reduce mkdir and config creations in submitWork hotpath
> -
>
> Key: HIVE-23449
> URL: https://issues.apache.org/jira/browse/HIVE-23449
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23449.1.patch, Screenshot 2020-05-12 at 1.09.35 
> PM.png
>
>
> !Screenshot 2020-05-12 at 1.09.35 PM.png|width=885,height=558!
>  
> For short jobs, submitWork gets into hotpath. This can lazy load conf and can 
> get rid of dir creations (which needs to be enabled only when DirWatcher is 
> enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23451) FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23451:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file
> ---
>
> Key: HIVE-23451
> URL: https://issues.apache.org/jira/browse/HIVE-23451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23451.1.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L826]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L797]
> Can avoid a NN call here (i.e, mainly for small queries).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23451) FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file

2020-05-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23451:

Attachment: HIVE-23451.1.patch

> FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file
> ---
>
> Key: HIVE-23451
> URL: https://issues.apache.org/jira/browse/HIVE-23451
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23451.1.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L826]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L797]
> Can avoid a NN call here (i.e, mainly for small queries).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23430) Optimise ASTNode::getChildren

2020-05-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-23430.
-
Resolution: Duplicate

> Optimise ASTNode::getChildren 
> --
>
> Key: HIVE-23430
> URL: https://issues.apache.org/jira/browse/HIVE-23430
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: image-2020-05-11-09-09-37-119.png
>
>
> !image-2020-05-11-09-09-37-119.png|width=1276,height=930!
>  
> Pink bars are from ASTNode::getChildren. Observed this, when large number of 
> small queries were executed in HS2 concurrently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23430) Optimise ASTNode::getChildren

2020-05-10 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104047#comment-17104047
 ] 

Rajesh Balamohan commented on HIVE-23430:
-

Yes, Marking this as duplicate. Just realized it is fixed in master.

> Optimise ASTNode::getChildren 
> --
>
> Key: HIVE-23430
> URL: https://issues.apache.org/jira/browse/HIVE-23430
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: image-2020-05-11-09-09-37-119.png
>
>
> !image-2020-05-11-09-09-37-119.png|width=1276,height=930!
>  
> Pink bars are from ASTNode::getChildren. Observed this, when large number of 
> small queries were executed in HS2 concurrently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23429) LLAP: Optimize retrieving queryId details in LlapTaskCommunicator

2020-05-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23429:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> LLAP: Optimize retrieving queryId details in LlapTaskCommunicator
> -
>
> Key: HIVE-23429
> URL: https://issues.apache.org/jira/browse/HIVE-23429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: HIVE-23429.1.patch
>
>
>  
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L825]
>  
> For small jobs, this becomes a bottleneck to unpack entire payload to lookup 
> HIVEQUERYID parameter. It would be good to share HIVEQUERYID in dagConf and 
> retrieve it back via DagInfo.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23429) LLAP: Optimize retrieving queryId details in LlapTaskCommunicator

2020-05-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23429:

Attachment: HIVE-23429.1.patch

> LLAP: Optimize retrieving queryId details in LlapTaskCommunicator
> -
>
> Key: HIVE-23429
> URL: https://issues.apache.org/jira/browse/HIVE-23429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: HIVE-23429.1.patch
>
>
>  
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L825]
>  
> For small jobs, this becomes a bottleneck to unpack entire payload to lookup 
> HIVEQUERYID parameter. It would be good to share HIVEQUERYID in dagConf and 
> retrieve it back via DagInfo.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23429) LLAP: Optimize retrieving queryId details in LlapTaskCommunicator

2020-05-10 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103984#comment-17103984
 ] 

Rajesh Balamohan commented on HIVE-23429:
-

Related ticket: https://issues.apache.org/jira/browse/TEZ-2672

> LLAP: Optimize retrieving queryId details in LlapTaskCommunicator
> -
>
> Key: HIVE-23429
> URL: https://issues.apache.org/jira/browse/HIVE-23429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
>  
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L825]
>  
> For small jobs, this becomes a bottleneck to unpack entire payload to lookup 
> HIVEQUERYID parameter. It would be good to share HIVEQUERYID in dagConf and 
> retrieve it back via DagInfo.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23294) Remove sync bottleneck in TezConfigurationFactory

2020-04-28 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095005#comment-17095005
 ] 

Rajesh Balamohan commented on HIVE-23294:
-

Tried out .2 in the cluster. Haven't seen major CPU burn with reflection and 
also removes the locking issue. 

+1.

> Remove sync bottleneck in TezConfigurationFactory
> -
>
> Key: HIVE-23294
> URL: https://issues.apache.org/jira/browse/HIVE-23294
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23294.1.patch, HIVE-23294.2.patch, Screenshot 
> 2020-04-24 at 1.53.20 PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java#L53]
> [https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L1628]
> It ends up locking for property names in the config. For short running 
> queries with concurrency, this is an issue.
>  
> !Screenshot 2020-04-24 at 1.53.20 PM.png|width=1086,height=459!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23261) Check whether encryption is enabled in the cluster before moving files

2020-04-27 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093368#comment-17093368
 ] 

Rajesh Balamohan commented on HIVE-23261:
-

It is possible that one of the dir is encrypted and the other is unencrypted. 
Current patch has to be modified to handle that case. Instead of 4 calls tp 
getEZPath, it would need 2 calls to handle this case.

> Check whether encryption is enabled in the cluster before moving files
> --
>
> Key: HIVE-23261
> URL: https://issues.apache.org/jira/browse/HIVE-23261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Minor
> Attachments: HIVE-23261.1.patch
>
>
> Similar to HIVE-23212, there is an unwanted check of encryption paths during 
> file move operation.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4546]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23294) Remove sync bottleneck in TezConfigurationFactory

2020-04-27 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093054#comment-17093054
 ] 

Rajesh Balamohan commented on HIVE-23294:
-

[~ashutoshc] : Thanks for sharing the patch. Impact with reflection approach is 
that, it ends up checking for member access every time it is accessed adding to 
the overhead.

> Remove sync bottleneck in TezConfigurationFactory
> -
>
> Key: HIVE-23294
> URL: https://issues.apache.org/jira/browse/HIVE-23294
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23294.1.patch, HIVE-23294.2.patch, Screenshot 
> 2020-04-24 at 1.53.20 PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java#L53]
> [https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L1628]
> It ends up locking for property names in the config. For short running 
> queries with concurrency, this is an issue.
>  
> !Screenshot 2020-04-24 at 1.53.20 PM.png|width=1086,height=459!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23282) Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-04-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23282:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> -
>
> Key: HIVE-23282
> URL: https://issues.apache.org/jira/browse/HIVE-23282
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23282.1.patch, image-2020-04-23-14-07-06-077.png
>
>
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
>  
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]
>  
> !image-2020-04-23-14-07-06-077.png|width=665,height=592!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23282) Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-04-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092649#comment-17092649
 ] 

Rajesh Balamohan edited comment on HIVE-23282 at 4/26/20, 10:38 AM:


[~ashutoshc]: Yes, only "PartitionKeys" are needed in sqlfilter. Posted the 
patch and the RB link.


was (Author: rajesh.balamohan):
[~ashutoshc]: PartitionKeys are needed in sqlfilter. Posted the patch and the 
RB link.

> Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> -
>
> Key: HIVE-23282
> URL: https://issues.apache.org/jira/browse/HIVE-23282
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23282.1.patch, image-2020-04-23-14-07-06-077.png
>
>
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
>  
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]
>  
> !image-2020-04-23-14-07-06-077.png|width=665,height=592!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23282) Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-04-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23282:

Attachment: HIVE-23282.1.patch

> Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> -
>
> Key: HIVE-23282
> URL: https://issues.apache.org/jira/browse/HIVE-23282
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23282.1.patch, image-2020-04-23-14-07-06-077.png
>
>
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
>  
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]
>  
> !image-2020-04-23-14-07-06-077.png|width=665,height=592!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23282) Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-04-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092649#comment-17092649
 ] 

Rajesh Balamohan commented on HIVE-23282:
-

[~ashutoshc]: PartitionKeys are needed in sqlfilter. Posted the patch and the 
RB link.

> Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> -
>
> Key: HIVE-23282
> URL: https://issues.apache.org/jira/browse/HIVE-23282
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23282.1.patch, image-2020-04-23-14-07-06-077.png
>
>
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
>  
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]
>  
> !image-2020-04-23-14-07-06-077.png|width=665,height=592!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22737) Concurrency: FunctionRegistry::getFunctionInfo is static object locked

2020-04-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092597#comment-17092597
 ] 

Rajesh Balamohan commented on HIVE-22737:
-

Yeah, missed that.

LGTM. +1 pending tests.

> Concurrency: FunctionRegistry::getFunctionInfo is static object locked
> --
>
> Key: HIVE-22737
> URL: https://issues.apache.org/jira/browse/HIVE-22737
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, UDF
>Reporter: Gopal Vijayaraghavan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: FunctionRegistry-lock.png, HIVE-22737.patch
>
>
> The lock is inside a HS2-wide static object
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L191
> {code}
>   // registry for system functions
>   private static final Registry system = new Registry(true);
> {code}
> And this is the lock itself
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java#L332
> {code}
>   public FunctionInfo getFunctionInfo(String functionName) throws 
> SemanticException {
> lock.lock();
> {code}
>  !FunctionRegistry-lock.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22737) Concurrency: FunctionRegistry::getFunctionInfo is static object locked

2020-04-25 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092505#comment-17092505
 ] 

Rajesh Balamohan commented on HIVE-22737:
-

Thanks for the patch [~ashutoshc] . Would it invoke 
"getFunctionInfo::addToCurrentFunctions" for all invocations with the patch?

> Concurrency: FunctionRegistry::getFunctionInfo is static object locked
> --
>
> Key: HIVE-22737
> URL: https://issues.apache.org/jira/browse/HIVE-22737
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, UDF
>Reporter: Gopal Vijayaraghavan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: FunctionRegistry-lock.png, HIVE-22737.patch
>
>
> The lock is inside a HS2-wide static object
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L191
> {code}
>   // registry for system functions
>   private static final Registry system = new Registry(true);
> {code}
> And this is the lock itself
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java#L332
> {code}
>   public FunctionInfo getFunctionInfo(String functionName) throws 
> SemanticException {
> lock.lock();
> {code}
>  !FunctionRegistry-lock.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23294) Remove sync bottlneck in TezConfigurationFactory

2020-04-24 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-23294:
---

Assignee: Rajesh Balamohan

> Remove sync bottlneck in TezConfigurationFactory
> 
>
> Key: HIVE-23294
> URL: https://issues.apache.org/jira/browse/HIVE-23294
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23294.1.patch, Screenshot 2020-04-24 at 1.53.20 
> PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java#L53]
> [https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L1628]
> It ends up locking for property names in the config. For short running 
> queries with concurrency, this is an issue.
>  
> !Screenshot 2020-04-24 at 1.53.20 PM.png|width=1086,height=459!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23294) Remove sync bottlneck in TezConfigurationFactory

2020-04-24 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23294:

Attachment: HIVE-23294.1.patch

> Remove sync bottlneck in TezConfigurationFactory
> 
>
> Key: HIVE-23294
> URL: https://issues.apache.org/jira/browse/HIVE-23294
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23294.1.patch, Screenshot 2020-04-24 at 1.53.20 
> PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java#L53]
> [https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L1628]
> It ends up locking for property names in the config. For short running 
> queries with concurrency, this is an issue.
>  
> !Screenshot 2020-04-24 at 1.53.20 PM.png|width=1086,height=459!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23294) Remove sync bottlneck in TezConfigurationFactory

2020-04-24 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23294:

Status: Patch Available  (was: Open)

> Remove sync bottlneck in TezConfigurationFactory
> 
>
> Key: HIVE-23294
> URL: https://issues.apache.org/jira/browse/HIVE-23294
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23294.1.patch, Screenshot 2020-04-24 at 1.53.20 
> PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezConfigurationFactory.java#L53]
> [https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L1628]
> It ends up locking for property names in the config. For short running 
> queries with concurrency, this is an issue.
>  
> !Screenshot 2020-04-24 at 1.53.20 PM.png|width=1086,height=459!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23282) Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-04-23 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23282:

Description: 
ObjectStore::getPartitionsByExprInternal internally uses Table information for 
getting partitionKeys, table, catalog name.

 

For this, it ends up populating entire table data from DB (including skew 
column, parameters, sort, bucket cols etc). This makes it a lot more expensive 
call. It would be good to check if MTable itself can be used instead of Table.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]

 

!image-2020-04-23-14-07-06-077.png|width=665,height=592!

  was:
ObjectStore::getPartitionsByExprInternal internally uses Table information for 
getting partitionKeys, table, catalog name.

 

For this, it ends up populating entire table data from DB (including skew 
column, parameters, sort, bucket cols etc). This makes it a lot more expensive 
call. It would be good to either have a lightweight object to have basic 
information or reduce the payload on Table object itself.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]

 

!image-2020-04-23-14-07-06-077.png|width=665,height=592!


> Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal
> -
>
> Key: HIVE-23282
> URL: https://issues.apache.org/jira/browse/HIVE-23282
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: image-2020-04-23-14-07-06-077.png
>
>
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
>  
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3327]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3669]
>  
> !image-2020-04-23-14-07-06-077.png|width=665,height=592!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23281) ObjectStore::convertToStorageDescriptor can be optimised to reduce calls to DB for ACID tables

2020-04-23 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23281:

Description: 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1980]

 

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1982]

 

SkewInfo, bucketCols, ordering etc are not needed for ACID tables. It may be 
good to check for transactional tables and get rid these calls in table lookups.

 

This should help in reducing DB calls.

!image-2020-04-23-13-56-17-210.png|width=669,height=485!

 

  was:
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1980]

 

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1982]

 

SkewInfo, bucketCols, ordering etc are not needed for ACID tables. It may be 
good to check for transactional tables and get rid these calls in table lookups.

 

This should help in reducing DB calls.


> ObjectStore::convertToStorageDescriptor can be optimised to reduce calls to 
> DB for ACID tables
> --
>
> Key: HIVE-23281
> URL: https://issues.apache.org/jira/browse/HIVE-23281
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: image-2020-04-23-13-56-17-210.png
>
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1980]
>  
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1982]
>  
> SkewInfo, bucketCols, ordering etc are not needed for ACID tables. It may be 
> good to check for transactional tables and get rid these calls in table 
> lookups.
>  
> This should help in reducing DB calls.
> !image-2020-04-23-13-56-17-210.png|width=669,height=485!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23281) ObjectStore::convertToStorageDescriptor can be optimised to reduce calls to DB for ACID tables

2020-04-23 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23281:

Attachment: image-2020-04-23-13-56-17-210.png

> ObjectStore::convertToStorageDescriptor can be optimised to reduce calls to 
> DB for ACID tables
> --
>
> Key: HIVE-23281
> URL: https://issues.apache.org/jira/browse/HIVE-23281
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: image-2020-04-23-13-56-17-210.png
>
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1980]
>  
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1982]
>  
> SkewInfo, bucketCols, ordering etc are not needed for ACID tables. It may be 
> good to check for transactional tables and get rid these calls in table 
> lookups.
>  
> This should help in reducing DB calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23261) Check whether encryption is enabled in the cluster before moving files

2020-04-20 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088285#comment-17088285
 ] 

Rajesh Balamohan commented on HIVE-23261:
-

+1 pending tests.

> Check whether encryption is enabled in the cluster before moving files
> --
>
> Key: HIVE-23261
> URL: https://issues.apache.org/jira/browse/HIVE-23261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Minor
> Attachments: HIVE-23261.1.patch
>
>
> Similar to HIVE-23212, there is an unwanted check of encryption paths during 
> file move operation.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4546]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-20 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087599#comment-17087599
 ] 

Rajesh Balamohan commented on HIVE-23210:
-

Thanks [~pgaref] . Latest patch LGTM. +1.

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23210.01.patch, HIVE-23210.02.patch, 
> HIVE-23210.03.patch, HIVE-23210.04.patch, HIVE-23210.wip.patch, 
> TestShortestJobFirstComparator.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23195) set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour

2020-04-19 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23195:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for the patch [~rzhappy]. Committed to master. 

> set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour
> --
>
> Key: HIVE-23195
> URL: https://issues.apache.org/jira/browse/HIVE-23195
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Richard Zhang
>Assignee: Richard Zhang
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23195.1.patch, HIVE-23195.2.patch
>
>
> the config "hive.cluster.delegation.token.gc-interval" is set as long 
> duration, 1 hour. This created some issues in a heavy loaded cluster in which 
> the tokens may not be cleaned up fast enough and the cleaner thread may fail 
> to clean the tokens. This may cause issues like eating too much space or LLAP 
> startup failures, or slow system startup.
> If this hive.cluster.delegation.token.gc-interval” is reduced from 1 hour to 
> a relatively shorter period such as 15 mins, then the zookeeper tokens will 
> be cleaned more timely and mitigate these issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23195) set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour

2020-04-19 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087291#comment-17087291
 ] 

Rajesh Balamohan commented on HIVE-23195:
-

+1

> set hive.cluster.delegation.token.gc-interval to 15 minutes instead of an hour
> --
>
> Key: HIVE-23195
> URL: https://issues.apache.org/jira/browse/HIVE-23195
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Richard Zhang
>Assignee: Richard Zhang
>Priority: Major
> Attachments: HIVE-23195.1.patch, HIVE-23195.2.patch
>
>
> the config "hive.cluster.delegation.token.gc-interval" is set as long 
> duration, 1 hour. This created some issues in a heavy loaded cluster in which 
> the tokens may not be cleaned up fast enough and the cleaner thread may fail 
> to clean the tokens. This may cause issues like eating too much space or LLAP 
> startup failures, or slow system startup.
> If this hive.cluster.delegation.token.gc-interval” is reduced from 1 hour to 
> a relatively shorter period such as 15 mins, then the zookeeper tokens will 
> be cleaned more timely and mitigate these issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-17 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085658#comment-17085658
 ] 

Rajesh Balamohan commented on HIVE-23210:
-

Issue is not only related to single task. It can happen whenever waitTime and 
knownPending tasks are equal and the tie has to be released. It does handle the 
case when you pointed out as higher waittime (in such case it does not fall 
into this codepath).

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23210.01.patch, HIVE-23210.02.patch, 
> HIVE-23210.wip.patch, TestShortestJobFirstComparator.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-17 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085647#comment-17085647
 ] 

Rajesh Balamohan commented on HIVE-23210:
-

[^HIVE-23210.wip.patch] : Sample patch that takes care of the corner cases. In 
this case, if "waitTimes" and "pendingTasks" are same, it compares current 
attempt's starttime to resolve the conflict.

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23210.01.patch, HIVE-23210.wip.patch, 
> TestShortestJobFirstComparator.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-17 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23210:

Attachment: HIVE-23210.wip.patch

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23210.01.patch, HIVE-23210.wip.patch, 
> TestShortestJobFirstComparator.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23196) Reduce number of delete calls to NN during Context::clear

2020-04-16 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085288#comment-17085288
 ] 

Rajesh Balamohan commented on HIVE-23196:
-

Another possibility: Context::clear can just enqueue the paths and gets deleted 
by another thread. This would free up delete call from query execution path.

> Reduce number of delete calls to NN during Context::clear
> -
>
> Key: HIVE-23196
> URL: https://issues.apache.org/jira/browse/HIVE-23196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HIVE-23196.1.patch
>
>
> {{Context::clear()}} ends up deleting same directories (or its subdirs) 
> multiple times. It would be good to reduce the number of delete calls to NN 
> for latency sensitive queries. This also has an impact on concurrent queries.
> {noformat}
> 2020-04-14T04:22:28,703 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting result dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1
> 2020-04-14T04:22:28,721 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13
> 2020-04-14T04:22:28,737 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1/.hive-staging_hive_2020-04-14_04-22-24_335_8573832618972595103-13{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23212) SemanticAnalyzer::getStagingDirectoryPathname should check for encryption zone only when needed

2020-04-15 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23212:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> SemanticAnalyzer::getStagingDirectoryPathname should check for encryption 
> zone only when needed
> ---
>
> Key: HIVE-23212
> URL: https://issues.apache.org/jira/browse/HIVE-23212
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23212.1.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2572]
>  
> When cluster does not have encryption zones configured, this ends up making 2 
> calls to NN unnecessarily. It would be good to guard it with config or check 
> for the KMS config from HDFS and invoke it on need basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23212) SemanticAnalyzer::getStagingDirectoryPathname should check for encryption zone only when needed

2020-04-15 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23212:

Attachment: HIVE-23212.1.patch

> SemanticAnalyzer::getStagingDirectoryPathname should check for encryption 
> zone only when needed
> ---
>
> Key: HIVE-23212
> URL: https://issues.apache.org/jira/browse/HIVE-23212
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23212.1.patch
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2572]
>  
> When cluster does not have encryption zones configured, this ends up making 2 
> calls to NN unnecessarily. It would be good to guard it with config or check 
> for the KMS config from HDFS and invoke it on need basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23218) LlapRecordReader queue limit computation is not optimal

2020-04-15 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084537#comment-17084537
 ] 

Rajesh Balamohan commented on HIVE-23218:
-

Temp workaround/hack is to set \{{hive.llap.io.vrb.queue.limit.min}} to a 
higher value.

> LlapRecordReader queue limit computation is not optimal
> ---
>
> Key: HIVE-23218
> URL: https://issues.apache.org/jira/browse/HIVE-23218
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Major
>
> After decoding {{OrcEncodedDataConsumer::decodeBatch}}, data is enqueued into 
> a queue in LlapRecordReader. Queue limit for this queue is determined in 
> LlapRecordReader. If it is minimal, it ends up waiting for 100ms until it 
> gets capacity.
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L168
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L590
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L260
> {{determineQueueLimit}} takes into consideration all columns though only few 
> columns are needed for projection. Here is an example.
> {noformat}
> create table test_acid(a1 string, a2 string, a3 string, a4 string, a5 string, 
> a6 string, a7 string, a8 string, a9 string, a10 string,
> a11 string, a22 string, a33 string, a44 string, a55 string, a66 string, a77 
> string, a88 string, a99 string, a100 string,
> a111 decimal(25,2), a222 decimal(25,2), a333 decimal(25,2), a444 
> decimal(25,2), a555 decimal(25,2), a666 decimal(25,2), a777 decimal(25,2),
>  a888 decimal(25,2), a999 decimal(25,2), a1000 decimal(25,2)) stored as orc;
> insert into table test_acid values 
> ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10",
> "a11","a22","a33","a44","a55","a66","a77","a88","a99","a100",
> 10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23
> );
> select a44, count(*) from test_acid where a44 like "a4%" group by a44 order 
> by a44;
> {noformat}
> For this query, queue size predicted would be "138" as it takes into account 
> all fields instead of just 2. This would causes unwanted delays in adding 
> data to the queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-15 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083880#comment-17083880
 ] 

Rajesh Balamohan commented on HIVE-23210:
-

TestShortestJobFirstComparator::testWaitQueueAging should reproduce the issue. 
In some cases, tasks which got submitted initially gets picked up at the last.

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: TestShortestJobFirstComparator.java
>
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23210) Fix shortestjobcomparator when jobs submitted have 1 task their vertices

2020-04-15 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23210:

Attachment: TestShortestJobFirstComparator.java

> Fix shortestjobcomparator when jobs submitted have 1 task their vertices
> 
>
> Key: HIVE-23210
> URL: https://issues.apache.org/jira/browse/HIVE-23210
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: TestShortestJobFirstComparator.java
>
>
> In latency sensitive queries, lots of jobs can have vertices with 1 task. 
> Currently shortestjobcomparator does not work correctly and returns tasks in 
> random order.
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator/ShortestJobFirstComparator.java#L51]
> This causes delay in the job runtime. I will attach a simple test case 
> shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23208) Update guaranteed capacity in ZK only when WM is enabled

2020-04-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23208:

Summary: Update guaranteed capacity in ZK only when WM is enabled  (was: 
Update guranteed capacity in ZK only when WM is enabled)

> Update guaranteed capacity in ZK only when WM is enabled
> 
>
> Key: HIVE-23208
> URL: https://issues.apache.org/jira/browse/HIVE-23208
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Major
>
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L1091]
>  
> Though WM is not enabled, it ends up updating ZK for every dag completion 
> event. For short running queries with concurrency, this ends up with lots of 
> calls to ZK.
> It would be good to invoke this only when WM is enabled.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23196) Reduce number of delete calls to NN during Context::clear

2020-04-14 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082987#comment-17082987
 ] 

Rajesh Balamohan commented on HIVE-23196:
-

Ref: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L846]

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L869]

> Reduce number of delete calls to NN during Context::clear
> -
>
> Key: HIVE-23196
> URL: https://issues.apache.org/jira/browse/HIVE-23196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> {{Context::clear()}} ends up deleting same directories (or its subdirs) 
> multiple times. It would be good to reduce the number of delete calls to NN 
> for latency sensitive queries. This also has an impact on concurrent queries.
> {noformat}
> 2020-04-14T04:22:28,703 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting result dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1
> 2020-04-14T04:22:28,721 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13
> 2020-04-14T04:22:28,737 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1/.hive-staging_hive_2020-04-14_04-22-24_335_8573832618972595103-13{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23196) Reduce number of delete calls to NN during Context::clear

2020-04-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23196:

Description: 
{{Context::clear()}} ends up deleting same directories (or its subdirs) 
multiple times. It would be good to reduce the number of delete calls to NN for 
latency sensitive queries. This also has an impact on concurrent queries.
{noformat}
2020-04-14T04:22:28,703 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting result dir: 
hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1
2020-04-14T04:22:28,721 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13
2020-04-14T04:22:28,737 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1/.hive-staging_hive_2020-04-14_04-22-24_335_8573832618972595103-13{noformat}

  was:
{\{Context::clear()}} ends up deleting same directories (or its subdirs) 
multiple times. It would be good to reduce the number of delete calls to NN for 
latency sensitive queries. This also has an impact on concurrent queries.

{noformat}
2020-04-14T04:22:28,703 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting result dir: 
hdfs://nn1:8020/tmp/hive/rbalamohan/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1
2020-04-14T04:22:28,721 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
hdfs://nn1:8020/tmp/hive/rbalamohan/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13
2020-04-14T04:22:28,737 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
hdfs://nn1:8020/tmp/hive/rbalamohan/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1/.hive-staging_hive_2020-04-14_04-22-24_335_8573832618972595103-13
{noformat}


> Reduce number of delete calls to NN during Context::clear
> -
>
> Key: HIVE-23196
> URL: https://issues.apache.org/jira/browse/HIVE-23196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> {{Context::clear()}} ends up deleting same directories (or its subdirs) 
> multiple times. It would be good to reduce the number of delete calls to NN 
> for latency sensitive queries. This also has an impact on concurrent queries.
> {noformat}
> 2020-04-14T04:22:28,703 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting result dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1
> 2020-04-14T04:22:28,721 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13
> 2020-04-14T04:22:28,737 DEBUG [7c6a6b09-ab37-4bc8-93a5-5da6fb154899 
> HiveServer2-Handler-Pool: Thread-378] ql.Context: Deleting scratch dir: 
> hdfs://nn1:8020/tmp/hive/xyz/7c6a6b09-ab37-4bc8-93a5-5da6fb154899/hive_2020-04-14_04-22-24_335_8573832618972595103-13/-mr-1/.hive-staging_hive_2020-04-14_04-22-24_335_8573832618972595103-13{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23029) LLAP: Shuffle Handler should support Index Cache configuration

2020-04-13 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23029:

Fix Version/s: 4.0.0
 Assignee: Rajesh Balamohan
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~ashutoshc] . Committed to master.

> LLAP: Shuffle Handler should support Index Cache configuration
> --
>
> Key: HIVE-23029
> URL: https://issues.apache.org/jira/browse/HIVE-23029
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-23029.1.patch, Screenshot 2020-03-16 at 12.08.44 
> PM.jpg
>
>
> !Screenshot 2020-03-16 at 12.08.44 PM.jpg|width=1592,height=1112!
>  
> Queries like Q78 at large scale misses index cache with unordered edges. (24 
> * 1009 = 24216. With the default 10 MB cache size, it can accommodate only 
> 400+ entries).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23154) Fix race condition in Utilities::mvFileToFinalPath

2020-04-09 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23154:

  Assignee: Rajesh Balamohan
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~ashutoshc]. Committed to master.

> Fix race condition in Utilities::mvFileToFinalPath
> --
>
> Key: HIVE-23154
> URL: https://issues.apache.org/jira/browse/HIVE-23154
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23154.1.patch, HIVE-23154.3.patch
>
>
> Utilities::mvFileToFinalPath is used for moving files from "/_tmp.-ext to 
> "/-ext" folder. Tasks write data to "_tmp" . Before writing to final 
> destination, they are moved to "-ext" folder. As part of it, it has checks to 
> ensure that run-away task outputs are not copied to "-ext" folder.
> Currently, there is a race condition between computing the snapshot of files 
> to be copied and the rename operation. Same issue persists in "insert into" 
> case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23154) Fix race condition in Utilities::mvFileToFinalPath

2020-04-09 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23154:

Attachment: HIVE-23154.3.patch

> Fix race condition in Utilities::mvFileToFinalPath
> --
>
> Key: HIVE-23154
> URL: https://issues.apache.org/jira/browse/HIVE-23154
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23154.1.patch, HIVE-23154.3.patch
>
>
> Utilities::mvFileToFinalPath is used for moving files from "/_tmp.-ext to 
> "/-ext" folder. Tasks write data to "_tmp" . Before writing to final 
> destination, they are moved to "-ext" folder. As part of it, it has checks to 
> ensure that run-away task outputs are not copied to "-ext" folder.
> Currently, there is a race condition between computing the snapshot of files 
> to be copied and the rename operation. Same issue persists in "insert into" 
> case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23154) Fix race condition in Utilities::mvFileToFinalPath

2020-04-08 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077918#comment-17077918
 ] 

Rajesh Balamohan commented on HIVE-23154:
-

RB: https://reviews.apache.org/r/72333/diff/1/

> Fix race condition in Utilities::mvFileToFinalPath
> --
>
> Key: HIVE-23154
> URL: https://issues.apache.org/jira/browse/HIVE-23154
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23154.1.patch
>
>
> Utilities::mvFileToFinalPath is used for moving files from "/_tmp.-ext to 
> "/-ext" folder. Tasks write data to "_tmp" . Before writing to final 
> destination, they are moved to "-ext" folder. As part of it, it has checks to 
> ensure that run-away task outputs are not copied to "-ext" folder.
> Currently, there is a race condition between computing the snapshot of files 
> to be copied and the rename operation. Same issue persists in "insert into" 
> case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23154) Fix race condition in Utilities::mvFileToFinalPath

2020-04-08 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23154:

Status: Patch Available  (was: Open)

> Fix race condition in Utilities::mvFileToFinalPath
> --
>
> Key: HIVE-23154
> URL: https://issues.apache.org/jira/browse/HIVE-23154
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23154.1.patch
>
>
> Utilities::mvFileToFinalPath is used for moving files from "/_tmp.-ext to 
> "/-ext" folder. Tasks write data to "_tmp" . Before writing to final 
> destination, they are moved to "-ext" folder. As part of it, it has checks to 
> ensure that run-away task outputs are not copied to "-ext" folder.
> Currently, there is a race condition between computing the snapshot of files 
> to be copied and the rename operation. Same issue persists in "insert into" 
> case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23154) Fix race condition in Utilities::mvFileToFinalPath

2020-04-08 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23154:

Attachment: HIVE-23154.1.patch

> Fix race condition in Utilities::mvFileToFinalPath
> --
>
> Key: HIVE-23154
> URL: https://issues.apache.org/jira/browse/HIVE-23154
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23154.1.patch
>
>
> Utilities::mvFileToFinalPath is used for moving files from "/_tmp.-ext to 
> "/-ext" folder. Tasks write data to "_tmp" . Before writing to final 
> destination, they are moved to "-ext" folder. As part of it, it has checks to 
> ensure that run-away task outputs are not copied to "-ext" folder.
> Currently, there is a race condition between computing the snapshot of files 
> to be copied and the rename operation. Same issue persists in "insert into" 
> case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23140:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~ashutoshc]. Committed to master.

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076783#comment-17076783
 ] 

Rajesh Balamohan commented on HIVE-23140:
-

"Insert overwrite/into" follows 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1506.
 Yes tried out the "CTAS and insert into/overwrite" cases and it works fine; 
When the same file name is present, it creates "00_0_copy_1" correctly.

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076388#comment-17076388
 ] 

Rajesh Balamohan commented on HIVE-23140:
-

Test case failure is not related to this patch.

{noformat}

org.apache.hadoop.hive.ql.parse.TestScheduledReplicationScenarios
Could not connect to meta store using any of the URIs provided. Most recent 
failure: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused
 at org.apache.thrift.transport.TSocket.open(TSocket.java:226)

org.apache.hadoop.hive.ql.parse.TestScheduledReplicationScenarios
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.BaseReplicationScenariosAcidTables.classLevelTearDown(BaseReplicationScenariosAcidTables.java:106)
at sun.reflect.NativeMethodAcc 
{noformat}

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23140:

Description: FileSinkOperator can be optimized to run file move operation 
(/_tmp.-ext --> /-ext-) in parallel fashion. Currently it invokes 
{{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
delays in cloud storage. FS rename can be used (S3A internally has parallel 
rename operation).   (was: FileSinkOperator can be optimized to run file move 
operation (/_tmp.-ext --> /-ext-10002) in parallel fashion. Currently it 
invokes {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode 
causing delays in cloud storage. FS rename can be used (S3A internally has 
parallel rename operation). )

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23140:

Attachment: HIVE-23140.1.patch

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-10002) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23140:

Status: Patch Available  (was: Open)

> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-10002) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23140) Optimise file move in CTAS

2020-04-06 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-23140:
---


> Optimise file move in CTAS 
> ---
>
> Key: HIVE-23140
> URL: https://issues.apache.org/jira/browse/HIVE-23140
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23140.1.patch
>
>
> FileSinkOperator can be optimized to run file move operation (/_tmp.-ext --> 
> /-ext-10002) in parallel fashion. Currently it invokes 
> {{Utilities.moveSpecifiedFileStatus}} and renames in sequential mode causing 
> delays in cloud storage. FS rename can be used (S3A internally has parallel 
> rename operation). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23122) LLAP: TaskExecutorService should log details about task eviction decision details

2020-04-02 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073717#comment-17073717
 ] 

Rajesh Balamohan commented on HIVE-23122:
-

Given that this is in hotpath, we can move to debug level, if this adds 
significant logging in certain jobs.

+1.

> LLAP: TaskExecutorService should log details about task eviction decision 
> details
> -
>
> Key: HIVE-23122
> URL: https://issues.apache.org/jira/browse/HIVE-23122
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-23122.02.patch
>
>
> TaskExecutorService maintains a waitQueue, and can evict a task in favor of 
> another. Under the hood, the queue uses a configurable 
> [comparator|https://github.com/apache/hive/tree/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/comparator].
>  The currently available comparators typically use the following properties 
> of a task(wrapper):
> getWithinDagPriority: related to vertex
> currentAttemptStartTime
> firstAttemptStartTime
> knownPending: remaining upstream tasks
> The problem is, when an eviction happens, the INFO level message doesn't 
> provide any insight about the decision, only attempts ids like below:
> {code}
> attempt_1585248378306_0010_72_02_96_8 evicted from wait queue in favor of 
> attempt_1585248378306_0003_175_02_79_175 because of lower priority
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23083) Enable fast serialization in xprod edge

2020-03-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068253#comment-17068253
 ] 

Rajesh Balamohan edited comment on HIVE-23083 at 3/27/20, 4:19 AM:
---

Both test failures with following exceptions are unrelated to this patch.
{noformat}
java.lang.AssertionError: No thread with name metastore_task_thread_test_impl_3 
found.
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.assertTrue(Assert.java:41)


Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:49903/default: java.net.ConnectException: Connection 
refused {noformat}
 


was (Author: rajesh.balamohan):
Both test failures with following exceptions are unrelated to this ticket
{noformat}
java.lang.AssertionError: No thread with name metastore_task_thread_test_impl_3 
found.
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.assertTrue(Assert.java:41)


Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:49903/default: java.net.ConnectException: Connection 
refused {noformat}
 

> Enable fast serialization in xprod edge
> ---
>
> Key: HIVE-23083
> URL: https://issues.apache.org/jira/browse/HIVE-23083
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23083.1.patch, Screenshot 2020-03-26 at 2.28.34 
> PM.png
>
>
> {noformat}
> select count(*) from store_sales, store, customer, customer_address where  
> ss_store_sk = s_store_sk and s_market_id=10 and ss_customer_sk = 
> c_customer_sk and c_birth_country <> upper(ca_country);
> {noformat}
> This uses "org/apache/hadoop/io/serializer/WritableSerialization" instead of 
> TezBytesWritableSerialization.
>  
> !Screenshot 2020-03-26 at 2.28.34 PM.png|width=812,height=488!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23083) Enable fast serialization in xprod edge

2020-03-26 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068253#comment-17068253
 ] 

Rajesh Balamohan commented on HIVE-23083:
-

Both test failures with following exceptions are unrelated to this ticket
{noformat}
java.lang.AssertionError: No thread with name metastore_task_thread_test_impl_3 
found.
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.assertTrue(Assert.java:41)


Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:49903/default: java.net.ConnectException: Connection 
refused {noformat}
 

> Enable fast serialization in xprod edge
> ---
>
> Key: HIVE-23083
> URL: https://issues.apache.org/jira/browse/HIVE-23083
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23083.1.patch, Screenshot 2020-03-26 at 2.28.34 
> PM.png
>
>
> {noformat}
> select count(*) from store_sales, store, customer, customer_address where  
> ss_store_sk = s_store_sk and s_market_id=10 and ss_customer_sk = 
> c_customer_sk and c_birth_country <> upper(ca_country);
> {noformat}
> This uses "org/apache/hadoop/io/serializer/WritableSerialization" instead of 
> TezBytesWritableSerialization.
>  
> !Screenshot 2020-03-26 at 2.28.34 PM.png|width=812,height=488!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23083) Enable fast serialization in xprod edge

2020-03-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-23083:
---

Assignee: Rajesh Balamohan

> Enable fast serialization in xprod edge
> ---
>
> Key: HIVE-23083
> URL: https://issues.apache.org/jira/browse/HIVE-23083
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23083.1.patch, Screenshot 2020-03-26 at 2.28.34 
> PM.png
>
>
> {noformat}
> select count(*) from store_sales, store, customer, customer_address where  
> ss_store_sk = s_store_sk and s_market_id=10 and ss_customer_sk = 
> c_customer_sk and c_birth_country <> upper(ca_country);
> {noformat}
> This uses "org/apache/hadoop/io/serializer/WritableSerialization" instead of 
> TezBytesWritableSerialization.
>  
> !Screenshot 2020-03-26 at 2.28.34 PM.png|width=812,height=488!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23083) Enable fast serialization in xprod edge

2020-03-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23083:

Status: Patch Available  (was: Open)

> Enable fast serialization in xprod edge
> ---
>
> Key: HIVE-23083
> URL: https://issues.apache.org/jira/browse/HIVE-23083
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23083.1.patch, Screenshot 2020-03-26 at 2.28.34 
> PM.png
>
>
> {noformat}
> select count(*) from store_sales, store, customer, customer_address where  
> ss_store_sk = s_store_sk and s_market_id=10 and ss_customer_sk = 
> c_customer_sk and c_birth_country <> upper(ca_country);
> {noformat}
> This uses "org/apache/hadoop/io/serializer/WritableSerialization" instead of 
> TezBytesWritableSerialization.
>  
> !Screenshot 2020-03-26 at 2.28.34 PM.png|width=812,height=488!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23083) Enable fast serialization in xprod edge

2020-03-26 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23083:

Attachment: HIVE-23083.1.patch

> Enable fast serialization in xprod edge
> ---
>
> Key: HIVE-23083
> URL: https://issues.apache.org/jira/browse/HIVE-23083
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23083.1.patch, Screenshot 2020-03-26 at 2.28.34 
> PM.png
>
>
> {noformat}
> select count(*) from store_sales, store, customer, customer_address where  
> ss_store_sk = s_store_sk and s_market_id=10 and ss_customer_sk = 
> c_customer_sk and c_birth_country <> upper(ca_country);
> {noformat}
> This uses "org/apache/hadoop/io/serializer/WritableSerialization" instead of 
> TezBytesWritableSerialization.
>  
> !Screenshot 2020-03-26 at 2.28.34 PM.png|width=812,height=488!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-19 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23002:

Fix Version/s: 4.0.0
 Assignee: Rajesh Balamohan
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~ashutoshc]

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23002.1.patch, HIVE-23002.2.patch, 
> HIVE-23002.3.patch, Screenshot 2020-03-10 at 5.01.34 AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-18 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062275#comment-17062275
 ] 

Rajesh Balamohan edited comment on HIVE-23002 at 3/19/20, 4:36 AM:
---

No, it will not be for every column value & not for every row. This scratch 
bytes is allocated & reused per operator.

 

e.g: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkCommonOperator.java#L113


was (Author: rajesh.balamohan):
No, it will not be for every column value & not for every row. This scratch 
bytes is allocated & reused per operator.

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, HIVE-23002.2.patch, 
> HIVE-23002.3.patch, Screenshot 2020-03-10 at 5.01.34 AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-18 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062275#comment-17062275
 ] 

Rajesh Balamohan commented on HIVE-23002:
-

No, it will not be for every column value & not for every row. This scratch 
bytes is allocated & reused per operator.

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, HIVE-23002.2.patch, 
> HIVE-23002.3.patch, Screenshot 2020-03-10 at 5.01.34 AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23027) Fix syntax error in llap package.py

2020-03-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23027:

  Assignee: Rajesh Balamohan
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix syntax error in llap package.py
> ---
>
> Key: HIVE-23027
> URL: https://issues.apache.org/jira/browse/HIVE-23027
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23027.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23027) Fix syntax error in llap package.py

2020-03-16 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060532#comment-17060532
 ] 

Rajesh Balamohan commented on HIVE-23027:
-

Committed to master. Thanks [~gopalv] .

> Fix syntax error in llap package.py
> ---
>
> Key: HIVE-23027
> URL: https://issues.apache.org/jira/browse/HIVE-23027
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23027.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23027) Fix syntax error in llap package.py

2020-03-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23027:

Fix Version/s: 4.0.0

> Fix syntax error in llap package.py
> ---
>
> Key: HIVE-23027
> URL: https://issues.apache.org/jira/browse/HIVE-23027
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-23027.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23002:

Attachment: HIVE-23002.3.patch

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, HIVE-23002.2.patch, 
> HIVE-23002.3.patch, Screenshot 2020-03-10 at 5.01.34 AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23029) LLAP: Shuffle Handler should support Index Cache configuration

2020-03-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23029:

Status: Patch Available  (was: Open)

> LLAP: Shuffle Handler should support Index Cache configuration
> --
>
> Key: HIVE-23029
> URL: https://issues.apache.org/jira/browse/HIVE-23029
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23029.1.patch, Screenshot 2020-03-16 at 12.08.44 
> PM.jpg
>
>
> !Screenshot 2020-03-16 at 12.08.44 PM.jpg|width=580,height=405!
>  
> Queries like Q78 at large scale misses index cache with unordered edges. (24 
> * 1009 = 24216. With the default 10 MB cache size, it can accommodate only 
> 400+ entries).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23029) LLAP: Shuffle Handler should support Index Cache configuration

2020-03-16 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23029:

Attachment: HIVE-23029.1.patch

> LLAP: Shuffle Handler should support Index Cache configuration
> --
>
> Key: HIVE-23029
> URL: https://issues.apache.org/jira/browse/HIVE-23029
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23029.1.patch, Screenshot 2020-03-16 at 12.08.44 
> PM.jpg
>
>
> !Screenshot 2020-03-16 at 12.08.44 PM.jpg|width=580,height=405!
>  
> Queries like Q78 at large scale misses index cache with unordered edges. (24 
> * 1009 = 24216. With the default 10 MB cache size, it can accommodate only 
> 400+ entries).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23029) LLAP: Shuffle Handler should support Index Cache configuration

2020-03-16 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059987#comment-17059987
 ] 

Rajesh Balamohan commented on HIVE-23029:
-

Ref: TEZ-3887

> LLAP: Shuffle Handler should support Index Cache configuration
> --
>
> Key: HIVE-23029
> URL: https://issues.apache.org/jira/browse/HIVE-23029
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: Screenshot 2020-03-16 at 12.08.44 PM.jpg
>
>
> !Screenshot 2020-03-16 at 12.08.44 PM.jpg|width=580,height=405!
>  
> Queries like Q78 at large scale misses index cache with unordered edges. (24 
> * 1009 = 24216. With the default 10 MB cache size, it can accommodate only 
> 400+ entries).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23027) Fix syntax error in llap package.py

2020-03-15 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23027:

Status: Patch Available  (was: Open)

> Fix syntax error in llap package.py
> ---
>
> Key: HIVE-23027
> URL: https://issues.apache.org/jira/browse/HIVE-23027
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23027.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23027) Fix syntax error in llap package.py

2020-03-15 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23027:

Attachment: HIVE-23027.1.patch

> Fix syntax error in llap package.py
> ---
>
> Key: HIVE-23027
> URL: https://issues.apache.org/jira/browse/HIVE-23027
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-23027.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-12 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23002:

Attachment: HIVE-23002.2.patch

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, HIVE-23002.2.patch, Screenshot 
> 2020-03-10 at 5.01.34 AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-11 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057611#comment-17057611
 ] 

Rajesh Balamohan commented on HIVE-23002:
-

per LazyBinarySerializeWrite; reused.

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, Screenshot 2020-03-10 at 5.01.34 
> AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23002:

Status: Patch Available  (was: Open)

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, Screenshot 2020-03-10 at 5.01.34 
> AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23002) Optimise LazyBinaryUtils.writeVLong

2020-03-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-23002:

Attachment: HIVE-23002.1.patch

> Optimise LazyBinaryUtils.writeVLong
> ---
>
> Key: HIVE-23002
> URL: https://issues.apache.org/jira/browse/HIVE-23002
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-23002.1.patch, Screenshot 2020-03-10 at 5.01.34 
> AM.jpg
>
>
> [https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java#L420]
> It would be good to add a method which accepts scratch bytes.
>  
>   !Screenshot 2020-03-10 at 5.01.34 AM.jpg|width=452,height=321!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052180#comment-17052180
 ] 

Rajesh Balamohan commented on HIVE-22966:
-

Starvation of tasks waiting for resources within the same vertex is what is 
being targeted in this patch. As mentioned in the ticket, it is not deadlock 
that it is trying to fix.

Tasks across vertices does not come into picture as DAG priority takes care of 
it. Waiting time is intentionally used based on the existing set of metrics 
available.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-22966:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv] . Committed to master.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051080#comment-17051080
 ] 

Rajesh Balamohan edited comment on HIVE-22975 at 3/4/20, 2:20 PM:
--

With patch, CPU usage for {{TopNKeyFilter.canForward}} went down from {{15% -> 
8%.}}

Used much higher value for "hive.optimize.topnkey.efficiency.check.nbatches", 
to avoid disabling topn.

With Q43 in my cluster,

DAG runtime with patch:
 R1: 88.34
 R2: 88.55
 R3: 88.07
 R4: 88.01
 R5: 87.96

DAG runtime without patch:
 R1: 95.46
 R2: 95.83
 R3: 96.77
 R4: 96.24
 R5: 95.13

~70% of the comparisons are via boundary checks now. This can be observed from 
the logs (i.e eff & total counters below)

 
{noformat}
<14>1 2020-03-04T10:14:09.776Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_220_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@a336438, 
topN=100, repeated=4240631, added=200, total=14045075, eff=9804244, 
forwardingRatio=0.30194435}
<14>1 2020-03-04T10:14:10.906Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_140_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@1d1a5be7, 
topN=100, repeated=4925112, added=234, total=17266776, eff=12341430, 
forwardingRatio=0.2852499}
<14>1 2020-03-04T10:14:11.623Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_210_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@80d07c2, 
topN=100, repeated=5278609, added=221, total=16243904, eff=10965074, 
forwardingRatio=0.324973}
<14>1 2020-03-04T10:14:12.324Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_180_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@22401517, 
topN=100, repeated=5481458, added=222, total=17540574, eff=12058894, 
forwardingRatio=0.31251428}
<14>1 2020-03-04T10:14:29.713Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_240_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@68c1743b, 
topN=100, repeated=3730824, added=207, total=12332974, eff=8601943, 
forwardingRatio=0.30252483}
<14>1 2020-03-04T10:14:31.570Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_250_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@6b5cce3a, 
topN=100, repeated=3506980, added=225, total=12343254, eff=8836049, 
forwardingRatio=0.28413942}
<14>1 2020-03-04T10:14:32.723Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_300_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3213767, 
topN=100, repeated=3707044, added=214, total=11707241, eff=783, 
forwardingRatio=0.31666368}
<14>1 2020-03-04T10:14:32.942Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_290_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@688ca605, 
topN=100, repeated=3794086, added=216, total=12343961, eff=8549659, 
forwardingRatio=0.30738124}
<14>1 2020-03-04T10:14:33.895Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_270_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@66031ad4, 
topN=100, repeated=3929489, added=214, total=13527930, eff=9598227, 
forwardingRatio=0.29048812}
<14>1 2020-03-04T10:14:34.534Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_280_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@11b29a09, 
topN=100, repeated=4104207, added=212, total=13709811, eff=9605392, 
forwardingRatio=0.29937825}
<14>1 2020-03-04T10:14:34.734Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_260_0"] Closing 

[jira] [Commented] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051080#comment-17051080
 ] 

Rajesh Balamohan commented on HIVE-22975:
-

With patch, CPU usage for {{TopNKeyFilter.canForward}} went down from {{15% -> 
8%}}

With Q43 in my cluster,

DAG runtime with patch:
 R1: 88.34
 R2: 88.55
 R3: 88.07
 R4: 88.01
 R5: 87.96

DAG runtime without patch:
 R1: 95.46
 R2: 95.83
 R3: 96.77
 R4: 96.24
 R5: 95.13

~70% of the comparisons are via boundary checks now. This can be observed from 
the logs (i.e eff & total counters below)

 
{noformat}
<14>1 2020-03-04T10:14:09.776Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_220_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@a336438, 
topN=100, repeated=4240631, added=200, total=14045075, eff=9804244, 
forwardingRatio=0.30194435}
<14>1 2020-03-04T10:14:10.906Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_140_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@1d1a5be7, 
topN=100, repeated=4925112, added=234, total=17266776, eff=12341430, 
forwardingRatio=0.2852499}
<14>1 2020-03-04T10:14:11.623Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_210_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@80d07c2, 
topN=100, repeated=5278609, added=221, total=16243904, eff=10965074, 
forwardingRatio=0.324973}
<14>1 2020-03-04T10:14:12.324Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_180_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@22401517, 
topN=100, repeated=5481458, added=222, total=17540574, eff=12058894, 
forwardingRatio=0.31251428}
<14>1 2020-03-04T10:14:29.713Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_240_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@68c1743b, 
topN=100, repeated=3730824, added=207, total=12332974, eff=8601943, 
forwardingRatio=0.30252483}
<14>1 2020-03-04T10:14:31.570Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_250_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@6b5cce3a, 
topN=100, repeated=3506980, added=225, total=12343254, eff=8836049, 
forwardingRatio=0.28413942}
<14>1 2020-03-04T10:14:32.723Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_300_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3213767, 
topN=100, repeated=3707044, added=214, total=11707241, eff=783, 
forwardingRatio=0.31666368}
<14>1 2020-03-04T10:14:32.942Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_290_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@688ca605, 
topN=100, repeated=3794086, added=216, total=12343961, eff=8549659, 
forwardingRatio=0.30738124}
<14>1 2020-03-04T10:14:33.895Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_270_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@66031ad4, 
topN=100, repeated=3929489, added=214, total=13527930, eff=9598227, 
forwardingRatio=0.29048812}
<14>1 2020-03-04T10:14:34.534Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_280_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@11b29a09, 
topN=100, repeated=4104207, added=212, total=13709811, eff=9605392, 
forwardingRatio=0.29937825}
<14>1 2020-03-04T10:14:34.734Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_260_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@7926ef7, 
topN=100, repeated=3946203, added=209, total=13548849, eff=9602437, 

[jira] [Updated] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-22975:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> Optimise TopNKeyFilter with boundary checks
> ---
>
> Key: HIVE-22975
> URL: https://issues.apache.org/jira/browse/HIVE-22975
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 
> PM.jpg
>
>
> !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322!
>  
> It would be good to add boundary checks to reduce cycles spent on topN 
> filter. E.g Q43 spends good amount of time in topN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-22975:

Attachment: HIVE-22975.1.patch

> Optimise TopNKeyFilter with boundary checks
> ---
>
> Key: HIVE-22975
> URL: https://issues.apache.org/jira/browse/HIVE-22975
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 
> PM.jpg
>
>
> !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322!
>  
> It would be good to add boundary checks to reduce cycles spent on topN 
> filter. E.g Q43 spends good amount of time in topN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-03 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050680#comment-17050680
 ] 

Rajesh Balamohan commented on HIVE-22966:
-

It does depending on the wait time. Wait time is used as proxy to schedule the 
attempts. For e.g, without the patch, longest wait time of the attempt was 1430 
ms with running time of 528 ms (total of 1900+ms) in Q55. With patch, longest 
wait time was the attempt was 741 ms with running time of 700 ms (total of 
1500ms). Depending on when the attempt gets scheduled, it impacts overall 
runtime of the vertex. 
 Patch reduces starvation period for the task by fair comparison with wait time.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-03 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-22966:

Status: Patch Available  (was: Open)

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-03 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-22966:

Attachment: HIVE-22966.4.patch

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >