date:20201130

[jira] [Resolved] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread Prasanth Jayachandran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24426.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged the PR. Thanks [~ayushtkn] for the contribution!

> Spark job fails with fixed LlapTaskUmbilicalServer port
> ---
>
> Key: HIVE-24426
> URL: https://issues.apache.org/jira/browse/HIVE-24426
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In case of cloud deployments, multiple executors are launched on name node, 
> and incase a fixed umbilical port is specified using 
> {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}}
> The job fails with BindException.
> {noformat}
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:605)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169)
>   at org.apache.hadoop.ipc.Server.(Server.java:3032)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122)
>   ... 26 more
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:588)
>   ... 34 more{noformat}
> To counter this, better to provide a range of ports



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24426?focusedWorklogId=518314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518314
 ]

ASF GitHub Bot logged work on HIVE-24426:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 07:13
Start Date: 01/Dec/20 07:13
Worklog Time Spent: 10m 
  Work Description: prasanthj merged pull request #1705:
URL: https://github.com/apache/hive/pull/1705


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518314)
Time Spent: 50m  (was: 40m)

> Spark job fails with fixed LlapTaskUmbilicalServer port
> ---
>
> Key: HIVE-24426
> URL: https://issues.apache.org/jira/browse/HIVE-24426
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In case of cloud deployments, multiple executors are launched on name node, 
> and incase a fixed umbilical port is specified using 
> {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}}
> The job fails with BindException.
> {noformat}
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:605)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169)
>   at org.apache.hadoop.ipc.Server.(Server.java:3032)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122)
>   ... 26 more
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:588)
>   ... 34 more{noformat}
> To counter this, better to provide a range of ports



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24426?focusedWorklogId=518311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518311
 ]

ASF GitHub Bot logged work on HIVE-24426:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 07:07
Start Date: 01/Dec/20 07:07
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1705:
URL: https://github.com/apache/hive/pull/1705#issuecomment-736269768


   Thanx @prasanthj for the review, I have added the success log as well. 
Please have a check.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518311)
Time Spent: 40m  (was: 0.5h)

> Spark job fails with fixed LlapTaskUmbilicalServer port
> ---
>
> Key: HIVE-24426
> URL: https://issues.apache.org/jira/browse/HIVE-24426
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In case of cloud deployments, multiple executors are launched on name node, 
> and incase a fixed umbilical port is specified using 
> {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}}
> The job fails with BindException.
> {noformat}
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:605)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169)
>   at org.apache.hadoop.ipc.Server.(Server.java:3032)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122)
>   ... 26 more
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:588)
>   ... 34 more{noformat}
> To counter this, better to provide a range of ports



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2020-11-30 Thread Anishek Agarwal (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17241303#comment-17241303
 ] 

Anishek Agarwal commented on HIVE-24456:


just wondering why not just by default move these hash functions, is there 
significant performance overhead here ?

> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-11-30 Thread Anishek Agarwal (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17241238#comment-17241238
 ] 

Anishek Agarwal commented on HIVE-24450:


[~belugabehr] you cant get sequence id's in blocks, replication will not work. 
it has to be one at a time. 

cc [~thejas]/[~aasha]/[~pkumarsinha]

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=518272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518272
 ]

ASF GitHub Bot logged work on HIVE-24397:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 03:55
Start Date: 01/Dec/20 03:55
Worklog Time Spent: 10m 
  Work Description: vnhive commented on pull request #1681:
URL: https://github.com/apache/hive/pull/1681#issuecomment-736201182


   @vihangk1 Addressed all your comments in this push.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518272)
Time Spent: 40m  (was: 0.5h)

> Add the projection specification to the table request object and add 
> placeholders in ObjectStore.java
> -
>
> Key: HIVE-24397
> URL: https://issues.apache.org/jira/browse/HIVE-24397
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518262
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 03:23
Start Date: 01/Dec/20 03:23
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1356:
URL: https://github.com/apache/hive/pull/1356#issuecomment-736192230


   > Yes, my main question is whether it is safe to skip the changes on 
`HiveSubQueryRemoveRule` and `HiveRelDecorrelator`. It looks fine to me since 
we've already shaded calcite within hive/ql.
   
   Yeah, it should be fine. Calcite uses guava API so shading guava causes no 
method error if we don't include calcite in shaded jar of hive/ql.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518262)
Time Spent: 5h 40m  (was: 5.5h)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518258
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 03:13
Start Date: 01/Dec/20 03:13
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1356:
URL: https://github.com/apache/hive/pull/1356#issuecomment-736189137


   Yes, my main question is whether it is safe to skip the changes on 
`HiveSubQueryRemoveRule` and `HiveRelDecorrelator`. It looks fine to me since 
we've already shaded calcite within hive/ql.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518258)
Time Spent: 5.5h  (was: 5h 20m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518257
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 03:11
Start Date: 01/Dec/20 03:11
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #1356:
URL: https://github.com/apache/hive/pull/1356#discussion_r533044970



##
File path: itests/hive-blobstore/pom.xml
##
@@ -55,33 +55,33 @@
 
 
   org.apache.hive
-  hive-metastore
+  hive-exec
   ${project.version}
   test
 
 
   org.apache.hive
   hive-metastore
   ${project.version}
-  tests
   test
 
 
   org.apache.hive
-  hive-it-unit
+  hive-metastore

Review comment:
   Oh cool. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518257)
Time Spent: 5h 20m  (was: 5h 10m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24458) Allow access to SArgs without converting to disjunctive normal form

2020-11-30 Thread Owen O'Malley (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-24458:



> Allow access to SArgs without converting to disjunctive normal form
> ---
>
> Key: HIVE-24458
> URL: https://issues.apache.org/jira/browse/HIVE-24458
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> For some use cases, it is useful to have access to the SArg expression in a 
> non-normalized form. Currently, the SArg only provides the fully normalized 
> expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24436?focusedWorklogId=518189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518189
 ]

ASF GitHub Bot logged work on HIVE-24436:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 00:14
Start Date: 01/Dec/20 00:14
Worklog Time Spent: 10m 
  Work Description: wangyum commented on pull request #1715:
URL: https://github.com/apache/hive/pull/1715#issuecomment-736131810


   This is for master branch: https://github.com/apache/hive/pull/1722



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518189)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix Avro NULL_DEFAULT_VALUE compatibility issue
> ---
>
> Key: HIVE-24436
> URL: https://issues.apache.org/jira/browse/HIVE-24436
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Affects Versions: 2.3.8
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Exception1:
> {noformat}
> - create hive serde table with Catalog
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 'void 
> org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
> java.lang.String, org.codehaus.jackson.JsonNode)'
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
>   at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
> {noformat}
> Exception2:
> {noformat}
> - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
>   org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.avro.AvroRuntimeException: Unknown datum class: class 
> org.codehaus.jackson.node.NullNode;
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
>   at 
> org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24456:
--
Labels: pull-request-available  (was: )

> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24436?focusedWorklogId=518188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518188
 ]

ASF GitHub Bot logged work on HIVE-24436:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 00:11
Start Date: 01/Dec/20 00:11
Worklog Time Spent: 10m 
  Work Description: wangyum opened a new pull request #1722:
URL: https://github.com/apache/hive/pull/1722


   ### What changes were proposed in this pull request?
   
   This pr replace `null` with `JsonProperties.NULL_VALUE` to fix compatibility 
issue:
   1. java.lang.NoSuchMethodError: 'void 
org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
java.lang.String, org.codehaus.jackson.JsonNode)'
  ```
  - create hive serde table with Catalog
  *** RUN ABORTED ***
java.lang.NoSuchMethodError: 'void 
org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
  java.lang.String, org.codehaus.jackson.JsonNode)'
at 
org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
at 
org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
  ```
   2. org.apache.avro.AvroRuntimeException: Unknown datum class: class 
org.codehaus.jackson.node.NullNode
  ```
  - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
  org.apache.avro.AvroRuntimeException: Unknown datum class: class 
org.codehaus.jackson.node.NullNode;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
at 
org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
  ```
   
   ### Why are the changes needed?
   
   For compatibility with Avro 1.9.x and Avro 1.10.0.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Build and run Spark test:
   ```
   mvn -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.hive.execution.HiveDDLSuite test -pl 
sql/hive
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518188)
Time Spent: 1h 10m  (was: 1h)

> Fix Avro NULL_DEFAULT_VALUE compatibility issue
> ---
>
> Key: HIVE-24436
> URL: https://issues.apache.org/jira/browse/HIVE-24436
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Affects Versions: 2.3.8
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Exception1:
> {noformat}
> - create hive serde table with Catalog
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 'void 
> org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
> java.lang.String,

[jira] [Work logged] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24456?focusedWorklogId=518187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518187
 ]

ASF GitHub Bot logged work on HIVE-24456:
-

Author: ASF GitHub Bot
Created on: 01/Dec/20 00:11
Start Date: 01/Dec/20 00:11
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #1721:
URL: https://github.com/apache/hive/pull/1721


   …lumn masking should be done with SHA512.
   
   
   
   ### What changes were proposed in this pull request?
   Column masking encoding changed to SHA512 if FIPS mode is enabled
   
   
   
   ### Why are the changes needed?
   For better security in FIPS mode.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. User should include the following property in hive-site.xml to indicate 
that FIPS mode is enabled in Hive.
   
   hive.masking.algo
sha512
   
   
   
   
   ### How was this patch tested?
   Local cluster. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518187)
Remaining Estimate: 0h
Time Spent: 10m

> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2020-11-30 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-24456:
-
Description: 
hive-site.xml should have the following property to indicate that FIPS mode is 
enabled.



    hive.masking.algo

     sha512



If this property is present, then GenericUDFMaskHash should use SHA512 instead 
of SHA256 encoding for column masking.

  was:
hive-site.xml should have the following property to indicate that FIPS mode is 
enabled.



    hive.masking.algo

     sha256



If this property is present, then GenericUDFMaskHash should use SHA512 instead 
of SHA256 encoding for column masking.


> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2020-11-30 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-24456:



> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Wish
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha256
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24455) Fix broken junit framework in storage-api

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24455:
--
Labels: pull-request-available  (was: )

> Fix broken junit framework in storage-api
> -
>
> Key: HIVE-24455
> URL: https://issues.apache.org/jira/browse/HIVE-24455
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The use of junit is broken in storage-api. It results in no tests being found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24455) Fix broken junit framework in storage-api

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24455?focusedWorklogId=518157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518157
 ]

ASF GitHub Bot logged work on HIVE-24455:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 22:38
Start Date: 30/Nov/20 22:38
Worklog Time Spent: 10m 
  Work Description: omalley opened a new pull request #1720:
URL: https://github.com/apache/hive/pull/1720


   Update the storage-api surefire plugin version to match the rest of hive.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518157)
Remaining Estimate: 0h
Time Spent: 10m

> Fix broken junit framework in storage-api
> -
>
> Key: HIVE-24455
> URL: https://issues.apache.org/jira/browse/HIVE-24455
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The use of junit is broken in storage-api. It results in no tests being found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24455) Fix broken junit framework in storage-api

2020-11-30 Thread Owen O'Malley (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-24455:



> Fix broken junit framework in storage-api
> -
>
> Key: HIVE-24455
> URL: https://issues.apache.org/jira/browse/HIVE-24455
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> The use of junit is broken in storage-api. It results in no tests being found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24144:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
> 
>
> Key: HIVE-24144
> URL: https://issues.apache.org/jira/browse/HIVE-24144
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
>   public String getIdentifierQuoteString() throws SQLException {
> return " ";
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?focusedWorklogId=518138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518138
 ]

ASF GitHub Bot logged work on HIVE-24450:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 21:53
Start Date: 30/Nov/20 21:53
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1718:
URL: https://github.com/apache/hive/pull/1718#issuecomment-736078852


   @belugabehr Although I am not very familiar with this area, but what happens 
if multiple HMS are running in HA? Wouldn't this solution mean, that 
potentially the order of the notification events will change? Two HMS are 
running HMS 1 gets id range 1-10, HMS 2 gets 11-20. Then openTxn notification 
goes to HMS 2 and allocateWriteId notification goes to HMS1. The sequence of 
the ids will not represent the sequence of the events. Wouldn't this mess up 
acid table replication?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518138)
Time Spent: 20m  (was: 10m)

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=518139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518139
 ]

ASF GitHub Bot logged work on HIVE-24144:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 21:53
Start Date: 30/Nov/20 21:53
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1487:
URL: https://github.com/apache/hive/pull/1487


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518139)
Time Spent: 1h  (was: 50m)

> getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
> 
>
> Key: HIVE-24144
> URL: https://issues.apache.org/jira/browse/HIVE-24144
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
>   public String getIdentifierQuoteString() throws SQLException {
> return " ";
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24073) Execution exception in sort-merge semijoin

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24073?focusedWorklogId=518137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518137
 ]

ASF GitHub Bot logged work on HIVE-24073:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 21:52
Start Date: 30/Nov/20 21:52
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1476:
URL: https://github.com/apache/hive/pull/1476#issuecomment-736078379


   @maheshk114 , is there any work remaining here? It seems there are result 
changes. Are these correct? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518137)
Time Spent: 0.5h  (was: 20m)

> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Reporter: Jesus Camacho Rodriguez
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
> this exception at execution time:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
>   ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
> overwrite nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
>   ... 23 more
> {code}
> To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in 
> the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been 
> merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Status: Patch Available  (was: Open)

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?focusedWorklogId=518130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518130
 ]

ASF GitHub Bot logged work on HIVE-24453:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 21:22
Start Date: 30/Nov/20 21:22
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1719:
URL: https://github.com/apache/hive/pull/1719


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518130)
Remaining Estimate: 0h
Time Spent: 10m

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24453:
--
Labels: pull-request-available  (was: )

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24453) DirectSQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Description: 
HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
Although the value for that field is always set after that patch, the value 
could be null if the database was created before the feature went in. DirectSQL 
should check for null value before parsing the integer, otherwise we hit an 
exception and fallback to ORM path:
{code}
2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
is not an error): null at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
 at 
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
{code}

  was:
HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
Although the value for that field is always set after that patch, the value 
could be null if the database was created before the feature went in. DirectSQL 
should check for null value before parsing the integer, otherwise we hit an 
exception and fallback to ORM path:
{noformat}
2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
is not an error): null at 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
 at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
 at 
org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
{noformat}


> DirectSQL error when parsing create_time value for database
> ---
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24453) Direct SQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24453:
---
Summary: Direct SQL error when parsing create_time value for database  
(was: DirectSQL error when parsing create_time value for database)

> Direct SQL error when parsing create_time value for database
> 
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {code}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24453) DirectSQL error when parsing create_time value for database

2020-11-30 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24453:
--


> DirectSQL error when parsing create_time value for database
> ---
>
> Key: HIVE-24453
> URL: https://issues.apache.org/jira/browse/HIVE-24453
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> HIVE-21077 introduced a {{create_time}} field for {{DBS}} table in HMS. 
> Although the value for that field is always set after that patch, the value 
> could be null if the database was created before the feature went in. 
> DirectSQL should check for null value before parsing the integer, otherwise 
> we hit an exception and fallback to ORM path:
> {noformat}
> 2020-11-28 09:06:05,414 WARN  org.apache.hadoop.hive.metastore.ObjectStore: 
> [pool-8-thread-194]: Falling back to ORM path due to direct SQL failure (this 
> is not an error): null at 
> org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.extractSqlInt(MetastoreDirectSqlUtils.java:251)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:420)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:839)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518119
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 20:26
Start Date: 30/Nov/20 20:26
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #1356:
URL: https://github.com/apache/hive/pull/1356#discussion_r532881891



##
File path: itests/hive-blobstore/pom.xml
##
@@ -55,33 +55,33 @@
 
 
   org.apache.hive
-  hive-metastore
+  hive-exec
   ${project.version}
   test
 
 
   org.apache.hive
   hive-metastore
   ${project.version}
-  tests
   test
 
 
   org.apache.hive
-  hive-it-unit
+  hive-metastore

Review comment:
   Oh, I don't change it actually. 
   
   This diff looks like I add new `hive-metastore` dependency but in the 
original pom.xml, it already includes two `hive-metastore`, one is without 
classifier and one is with `tests` classifier.
   
   Just the way git showing diff confusing readers.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518119)
Time Spent: 5h 10m  (was: 5h)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518111
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 20:17
Start Date: 30/Nov/20 20:17
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1356:
URL: https://github.com/apache/hive/pull/1356#issuecomment-736018282


   > Thanks @viirya ! the new PR looks almost good to me except one nit.
   > 
   > Also comparing to the original patch, we don't have changes to 
`HiveRelDecorrelator`, `HiveAggregate` and `HiveSubQueryRemoveRule`. This is 
unnecessary because we've shaded Guava within `hive-exec`? (some of the APIs 
like `operandJ` do not exist in the Calcite version used by branch-2.3 also).
   
   
   The change to `HiveAggregate` just to remove unused parameter `groupSets` in 
`deriveRowType`. Not related to shading guava, so I don't apply it.
   
   The change from `operand` to `operandJ` in `HiveSubQueryRemoveRule` and 
`HiveRelDecorrelator`, cannot apply to branch-2.3 because `operandJ` is not in 
calcite 1.10.0. The API was add since calcite 1.17.0 
(https://github.com/apache/calcite/commit/d59b639d/).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518111)
Time Spent: 5h  (was: 4h 50m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518102
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 19:52
Start Date: 30/Nov/20 19:52
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #1356:
URL: https://github.com/apache/hive/pull/1356#discussion_r532848370



##
File path: itests/hive-blobstore/pom.xml
##
@@ -55,33 +55,33 @@
 
 
   org.apache.hive
-  hive-metastore
+  hive-exec
   ${project.version}
   test
 
 
   org.apache.hive
   hive-metastore
   ${project.version}
-  tests
   test
 
 
   org.apache.hive
-  hive-it-unit
+  hive-metastore

Review comment:
   hmm why we need two `hive-metastore` dependencies (one with tests 
classifier)? I don't see this in the original patch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518102)
Time Spent: 4h 50m  (was: 4h 40m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24452) Add a generic JDBC implementation that can be used to other JDBC DBs

2020-11-30 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240994#comment-17240994
 ] 

David Mollitor commented on HIVE-24452:
---

Please consider using an abstraction layer that deals with the different 
vendors for you.

http://www.jooq.org/
https://blog.mybatis.org/
https://www.eclipse.org/eclipselink/
https://hibernate.org/

> Add a generic JDBC implementation that can be used to other JDBC DBs
> 
>
> Key: HIVE-24452
> URL: https://issues.apache.org/jira/browse/HIVE-24452
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Priority: Major
>
> Currently, we added a custom provider for each of the JDBC DBs supported by 
> hive (MySQL, Postgres, MSSQL(pending), Oracle(pending) and Derby (pending)).  
> But if there are other JDBC providers we want to add support for, adding a 
> generic JDBC provider would be useful that hive can default to.
> This means
> 1) We have to support means to indicate that a connector is for a JDBC 
> datasource. So maybe add a property in DCPROPERTIES on connector to indicate 
> that the datasource supports JDBC.
> 2) If there is no custom connector for a data source, use the 
> GenericJDBCDatasource connector that is to be added as part of this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.

2020-11-30 Thread Naveen Gangam (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240973#comment-17240973
 ] 

Naveen Gangam commented on HIVE-24448:
--

I had made a test fix in SemanticAnalyzer.processTable() method to remove the 
conversion toLowerCase(). That fix resulted in 4 test failures.

{noformat}
Testing / split-08 / Archive / testCliDriver[reduce_deduplicate_null_keys] – 
org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver
Testing / split-20 / Archive / testActiveSessionTimeMetrics – 
org.apache.hive.service.cli.session.TestSessionManagerMetrics
Testing / split-17 / Archive / testCliDriver[cte_6] – 
org.apache.hadoop.hive.cli.split5.TestMiniLlapLocalCliDriver
Testing / split-07 / Archive / testCliDriver[dynpart_sort_optimization] – 
org.apache.hadoop.hive.cli.split7.TestMiniLlapLocalCliDriver
{noformat}

All the 3 failures from the llap test driver are because of this assertion in 
the code.
{noformat}
java.lang.AssertionError
at org.apache.hadoop.hive.ql.parse.QB.rewriteViewToSubq(QB.java:256)
at org.apache.hadoop.hive.ql.parse.QB.rewriteCTEToSubq(QB.java:264)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.addCTEAsSubQuery(SemanticAnalyzer.java:1337)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2202)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2142)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12403)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12507)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:302)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:302)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:469)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:385)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:379)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.GeneratedMethodAccessor171.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=518081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518081
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:49
Start Date: 30/Nov/20 18:49
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r532821552



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest 
rqst, Connection dbConn
   }
   String dbName = normalizeCase(lc.getDbname());
   String tblName = normalizeCase(lc.getTablename());
-  String partName = normalizeCase(lc.getPartitionname());
+  String partName = lc.getPartitionname();

Review comment:
   Yes, i validated below 4 SQL's with my patch, partition(key=value) is 
already normalized 
   
   insert into table abc PARTITION(CitY='Bangalore') values('Dan');
   insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
   update table abc set Name='xy' where CiTy='Bangalore';
   delete from abc where CiTy='Bangalore';





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518081)
Time Spent: 1h 10m  (was: 1h)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?focusedWorklogId=518075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518075
 ]

ASF GitHub Bot logged work on HIVE-24450:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:36
Start Date: 30/Nov/20 18:36
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1718:
URL: https://github.com/apache/hive/pull/1718


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518075)
Remaining Estimate: 0h
Time Spent: 10m

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24450:
--
Labels: pull-request-available  (was: )

> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24450) DbNotificationListener Request Notification IDs in Batches

2020-11-30 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24450:
-


> DbNotificationListener Request Notification IDs in Batches
> --
>
> Key: HIVE-24450
> URL: https://issues.apache.org/jira/browse/HIVE-24450
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Every time a new notification event is logged into the database, the sequence 
> number for the ID of the even is incremented by one.  It is very standard in 
> database design to instead request a block of IDs for each fetch from the 
> database.  The sequence numbers are then handed out locally until the block 
> of IDs is exhausted.  This allows for fewer database round-trips and 
> transactions, at the expense of perhaps burning a few IDs.
> Burning of IDs happens when the server is restarted in the middle of a block 
> of sequence IDs.  That is, if the HMS requests a block of 10 ids, and only 
> three have been assigned, after the restart, the HMS will request another 
> block of 10, burning (wasting) 7 IDs.  As long as the blocks are not too 
> small, and restarts are infrequent, then few IDs are lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24436?focusedWorklogId=518070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518070
 ]

ASF GitHub Bot logged work on HIVE-24436:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:31
Start Date: 30/Nov/20 18:31
Worklog Time Spent: 10m 
  Work Description: sunchao edited a comment on pull request #1715:
URL: https://github.com/apache/hive/pull/1715#issuecomment-735961948


   Yes @dongjoon-hyun , these test failures have been there since 2.3.7 
release. I do plan to take a look at them later.
   
   @wangyum I believe the issue exists in the master branch as well? if so, can 
we make this PR against the master and backport to branch-2.3/branch-3.1 later 
once that is merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518070)
Time Spent: 1h  (was: 50m)

> Fix Avro NULL_DEFAULT_VALUE compatibility issue
> ---
>
> Key: HIVE-24436
> URL: https://issues.apache.org/jira/browse/HIVE-24436
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Affects Versions: 2.3.8
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exception1:
> {noformat}
> - create hive serde table with Catalog
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 'void 
> org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
> java.lang.String, org.codehaus.jackson.JsonNode)'
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
>   at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
> {noformat}
> Exception2:
> {noformat}
> - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
>   org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.avro.AvroRuntimeException: Unknown datum class: class 
> org.codehaus.jackson.node.NullNode;
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
>   at 
> org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24436?focusedWorklogId=518067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518067
 ]

ASF GitHub Bot logged work on HIVE-24436:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:28
Start Date: 30/Nov/20 18:28
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1715:
URL: https://github.com/apache/hive/pull/1715#issuecomment-735961948


   Yes @dongjoon-hyun , these test failures have been there since 2.3.7 
release. I do plan to take a look at them later.I  we make this PR against the 
master?
   
   @wangyum I believe the issue exists in the master branch as well? if so, can 
we make this PR against the master and backport to branch-2.3/branch-3.1 later 
once that is merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518067)
Time Spent: 50m  (was: 40m)

> Fix Avro NULL_DEFAULT_VALUE compatibility issue
> ---
>
> Key: HIVE-24436
> URL: https://issues.apache.org/jira/browse/HIVE-24436
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Affects Versions: 2.3.8
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Exception1:
> {noformat}
> - create hive serde table with Catalog
> *** RUN ABORTED ***
>   java.lang.NoSuchMethodError: 'void 
> org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
> java.lang.String, org.codehaus.jackson.JsonNode)'
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
>   at 
> org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
>   at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
>   at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
> {noformat}
> Exception2:
> {noformat}
> - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
>   org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.avro.AvroRuntimeException: Unknown datum class: class 
> org.codehaus.jackson.node.NullNode;
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
>   at 
> org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId

2020-11-30 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24424.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master.  Thank you [~abstractdog] and [~mgergely] for the reviews!!

> Use PreparedStatements in DbNotificationListener getNextNLId
> 
>
> Key: HIVE-24424
> URL: https://issues.apache.org/jira/browse/HIVE-24424
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Simplify the code, remove debug logging concatenation, and make it more 
> readable,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24424?focusedWorklogId=518066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518066
 ]

ASF GitHub Bot logged work on HIVE-24424:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:20
Start Date: 30/Nov/20 18:20
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1704:
URL: https://github.com/apache/hive/pull/1704


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518066)
Time Spent: 1h 20m  (was: 1h 10m)

> Use PreparedStatements in DbNotificationListener getNextNLId
> 
>
> Key: HIVE-24424
> URL: https://issues.apache.org/jira/browse/HIVE-24424
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Simplify the code, remove debug logging concatenation, and make it more 
> readable,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=518061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518061
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:07
Start Date: 30/Nov/20 18:07
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1356:
URL: https://github.com/apache/hive/pull/1356#issuecomment-735950137


   Internally we test this patch and pass all Spark tests. I think it gives us 
more confidence to have this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518061)
Time Spent: 4h 40m  (was: 4.5h)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518060
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:06
Start Date: 30/Nov/20 18:06
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532794904



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,15 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);
+Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, locPath);
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(

Review comment:
   No, not with HIVE-24291.
   Without HIVE-24291 (which might not be usable if for example if HMS schema 
changes are out the question) we still could have a pileup of the same 
table/partition in "ready for cleaning" in the queue.
   Without this change (HIVE-2) some of them might not be deleted.
   The goal of this change is that, when the table does get cleaned, all of the 
records will be deleted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518060)
Time Spent: 3h 20m  (was: 3h 10m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518059
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:04
Start Date: 30/Nov/20 18:04
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532794904



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,15 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);
+Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, locPath);
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(

Review comment:
   No, not with HIVE-24291.
   Without HIVE-24291 (which might not be usable if for example if HMS schema 
changes are out the question) and without this change, we still could have a 
pileup of the same table/partition in "ready for cleaning" in the queue.
   Without this change (HIVE-2) some of them might not be deleted.
   The goal of this change is that, when the table does get cleaned, all of the 
records will be deleted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518059)
Time Spent: 3h 10m  (was: 3h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518057
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 18:00
Start Date: 30/Nov/20 18:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532776190



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();

Review comment:
   Will we consider all writes as valid here? Shouldn't we limit at least 
by compactor txnId?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518057)
Time Spent: 3h  (was: 2h 50m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518048
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 17:35
Start Date: 30/Nov/20 17:35
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532776190



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();

Review comment:
   Will we consider all writes as valid here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518048)
Time Spent: 2h 50m  (was: 2h 40m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518046
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 17:31
Start Date: 30/Nov/20 17:31
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532772806



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,15 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);
+Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, locPath);
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(

Review comment:
   what would happen if there is a long running read-only txn + multiple 
compaction attempts? are we going to have multiple records in a queue for the 
same table/partition pending cleanup? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518046)
Time Spent: 2h 40m  (was: 2.5h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518045
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 17:30
Start Date: 30/Nov/20 17:30
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532772806



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,15 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);
+Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, locPath);
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(

Review comment:
   what would happen if there is a long running read-only txn + multiple 
compaction attempts? are we going to have multiple records in a queue for the 
same tab;e/partition pending cleanup? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518045)
Time Spent: 2.5h  (was: 2h 20m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=518043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518043
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 17:18
Start Date: 30/Nov/20 17:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532764005



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,15 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);
+Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, locPath);
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(

Review comment:
   could be replaced with above fs variable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 518043)
Time Spent: 2h 20m  (was: 2h 10m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24449) Implement connector provider for Derby DB

2020-11-30 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24449:



> Implement connector provider for Derby DB
> -
>
> Key: HIVE-24449
> URL: https://issues.apache.org/jira/browse/HIVE-24449
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> Provide an implementation of Connector provider for Derby DB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24447) Move create/drop/alter table to the provider interface

2020-11-30 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24447:



> Move create/drop/alter table to the provider interface
> --
>
> Key: HIVE-24447
> URL: https://issues.apache.org/jira/browse/HIVE-24447
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> The support for such operations on a table in a REMOTE database will be left 
> to the discretion of the providers to support/implement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?focusedWorklogId=517979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517979
 ]

ASF GitHub Bot logged work on HIVE-24446:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 14:55
Start Date: 30/Nov/20 14:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1717:
URL: https://github.com/apache/hive/pull/1717


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517979)
Remaining Estimate: 0h
Time Spent: 10m

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, Types
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24446:
--
Labels: pull-request-available  (was: )

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, Types
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24446:
--
Component/s: Materialized views

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24446:
--
Component/s: Types

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, Types
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24446:
--
Description: 
{code:java}
create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
('transactional'='true') as
select
  total_views `total_views`,
  sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
  program
from tv_view_data;
{code}
{code:java}
LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
  HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
table:alias=[mv_tv_view_data_av1])
{code}
Some constant decimal values are not padded in the result set.

{code}
select
POSTHOOK: query: select
  t.quartile,
  t.quartile,
  max(t.total_views) total
  max(t.total_views) total
from wealth t2,
from wealth t2,
(select
(select
  total_views `total_views`,
  total_views `total_views`,
  sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
  sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
  program
  program
from tv_view_data) t
from tv_view_data) t
where t.program=t2.watches
where t.program=t2.watches
group by quartile
group by quartile
order by quartile
{code}
{code}
1.5 130
4.5 1500
6.0 2000
{code}


  was:
{code}
create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
('transactional'='true') as
select
  total_views `total_views`,
  sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
  program
from tv_view_data;
{code}
{code}
LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
  HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
table:alias=[mv_tv_view_data_av1])
{code}


> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24446:
--
Affects Version/s: 4.0.0

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code:java}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code:java}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}
> Some constant decimal values are not padded in the result set.
> {code}
> select
> POSTHOOK: query: select
>   t.quartile,
>   t.quartile,
>   max(t.total_views) total
>   max(t.total_views) total
> from wealth t2,
> from wealth t2,
> (select
> (select
>   total_views `total_views`,
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
>   program
> from tv_view_data) t
> from tv_view_data) t
> where t.program=t2.watches
> where t.program=t2.watches
> group by quartile
> group by quartile
> order by quartile
> {code}
> {code}
> 1.5   130
> 4.5   1500
> 6.0   2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24423) Improve DbNotificationListener Thread

2020-11-30 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24423.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

HIVE-24423: Improve DbNotificationListener Thread (David Mollitor reviewed by 
Naveen Gangam, Miklos Gergely)

Thanks [~ngangam] and [~mgergely] for the review!

> Improve DbNotificationListener Thread
> -
>
> Key: HIVE-24423
> URL: https://issues.apache.org/jira/browse/HIVE-24423
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Clean up and simplify {{DbNotificationListener}} thread class.
> Most importantly, stop the thread and wait for it to finish before launching 
> a new thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24423) Improve DbNotificationListener Thread

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24423?focusedWorklogId=517965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517965
 ]

ASF GitHub Bot logged work on HIVE-24423:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 14:30
Start Date: 30/Nov/20 14:30
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1703:
URL: https://github.com/apache/hive/pull/1703


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517965)
Time Spent: 1h  (was: 50m)

> Improve DbNotificationListener Thread
> -
>
> Key: HIVE-24423
> URL: https://issues.apache.org/jira/browse/HIVE-24423
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Clean up and simplify {{DbNotificationListener}} thread class.
> Most importantly, stop the thread and wait for it to finish before launching 
> a new thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24446) Materialized View plan alters explicit cast type in query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24446:
--
Summary: Materialized View plan alters explicit cast type in query  (was: 
Materialized View plan remove explicit cast from query)

> Materialized View plan alters explicit cast type in query
> -
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24446) Materialized View plan remove explicit cast from query

2020-11-30 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24446:
-


> Materialized View plan remove explicit cast from query
> --
>
> Key: HIVE-24446
> URL: https://issues.apache.org/jira/browse/HIVE-24446
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> create materialized view mv_tv_view_data_av2 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select
>   total_views `total_views`,
>   sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
>   program
> from tv_view_data;
> {code}
> {code}
> LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
>   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
> table:alias=[mv_tv_view_data_av1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21843) UNION query with regular expressions for column name does not work

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21843?focusedWorklogId=517945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517945
 ]

ASF GitHub Bot logged work on HIVE-21843:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 13:57
Start Date: 30/Nov/20 13:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1684:
URL: https://github.com/apache/hive/pull/1684#discussion_r532610692



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -4453,7 +4452,7 @@ private boolean isAggregateInSelect(Node node, 
Collection aggregateFunc
* Returns whether the pattern is a regex expression (instead of a normal
* string). Normal string is a string with all alphabets/digits and "_".
*/
-  boolean isRegex(String pattern, HiveConf conf) {
+  static boolean isRegex(String pattern, HiveConf conf) {

Review comment:
   This function is called from `SemanticAnalyzer` and its subclasses. I 
haven't found any other invocations. Is it necessary to change this to `static` 
?

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/PrivilegesTestBase.java
##
@@ -35,8 +37,9 @@
 
   public static void grantUserTable(String privStr, PrivilegeType privType, 
QueryState queryState, Hive db)
   throws Exception {
+Context ctx=new Context(new HiveConf());
 DDLWork work = AuthorizationTestUtil.analyze(
-"GRANT " + privStr + " ON TABLE " + TABLE + " TO USER " + USER, 
queryState, db);
+"GRANT " + privStr + " ON TABLE " + TABLE + " TO USER " + USER, 
queryState, db,ctx);

Review comment:
   nit: space
   `queryState, db, ctx);`

##
File path: ql/src/test/org/apache/hadoop/hive/ql/tool/TestLineageInfo.java
##
@@ -58,7 +76,7 @@ public void testSimpleQuery() {
 try {
   lep.getLineageInfo("INSERT OVERWRITE TABLE dest1 partition (ds = '111')  
" 
   + "SELECT s.* FROM srcpart TABLESAMPLE (BUCKET 1 OUT OF 1) s " 
-  + "WHERE s.ds='2008-04-08' and s.hr='11'");
+  + "WHERE s.ds='2008-04-08' and s.hr='11'",ctx);

Review comment:
   nit: space
   `s.hr='11'", ctx);`

##
File path: ql/src/test/org/apache/hadoop/hive/ql/tool/TestLineageInfo.java
##
@@ -128,7 +136,7 @@ public void testSimpleQuery5() {
 LineageInfo lep = new LineageInfo();
 try {
   lep.getLineageInfo("insert overwrite table x select a.y, b.y " 
-  + "from a a full outer join b b on (a.x = b.y)");
+  + "from a a full outer join b b on (a.x = b.y)",ctx);

Review comment:
   `(a.x = b.y)", ctx);`

##
File path: ql/src/test/org/apache/hadoop/hive/ql/tool/TestLineageInfo.java
##
@@ -71,47 +89,37 @@ public void testSimpleQuery() {
   }
 
   @Test
-  public void testSimpleQuery2() {
+  public void testSimpleQuery2() throws Exception {
 LineageInfo lep = new LineageInfo();
-try {
-  lep.getLineageInfo("FROM (FROM src select src.key, src.value " 
-  + "WHERE src.key < 10 UNION ALL FROM src SELECT src.* WHERE src.key 
> 10 ) unioninput " 
-  + "INSERT OVERWRITE DIRECTORY 
'../../../../build/contrib/hive/ql/test/data/warehouse/union.out' " 
-  + "SELECT unioninput.*");
-  TreeSet i = new TreeSet();
-  TreeSet o = new TreeSet();
-  i.add("src");
-  checkOutput(lep, i, o);
-} catch (Exception e) {
-  e.printStackTrace();
-  fail("Failed");
-}
+lep.getLineageInfo("FROM (FROM src select src.key, src.value "
++ "WHERE src.key < 10 UNION ALL FROM src SELECT src.* WHERE src.key > 
10 ) unioninput "
++ "INSERT OVERWRITE DIRECTORY 
'../../../../build/contrib/hive/ql/test/data/warehouse/union.out' "
++ "SELECT unioninput.*",ctx);
+TreeSet i = new TreeSet();
+TreeSet o = new TreeSet();
+i.add("src");
+checkOutput(lep, i, o);
   }
 
   @Test
-  public void testSimpleQuery3() {
+  public void testSimpleQuery3() throws Exception {
 LineageInfo lep = new LineageInfo();
-try {
-  lep.getLineageInfo("FROM (FROM src select src.key, src.value " 
-  + "WHERE src.key < 10 UNION ALL FROM src1 SELECT src1.* WHERE 
src1.key > 10 ) unioninput " 
-  + "INSERT OVERWRITE DIRECTORY 
'../../../../build/contrib/hive/ql/test/data/warehouse/union.out' " 
-  + "SELECT unioninput.*");
-  TreeSet i = new TreeSet();
-  TreeSet o = new TreeSet();
-  i.add("src");
-  i.add("src1");
-  checkOutput(lep, i, o);
-} catch (Exception e) {
-  e.printStackTrace();
-  fail("Failed");
-}
+lep.getLineageInfo("FROM (FROM src select src.key, src.value "
++ "WHERE src.key < 10 UNION ALL FROM src1 SELECT src1.* WHERE src1.key 
> 10 ) unioninput "
++ "INSERT OVERWRITE DIRECTORY 
'../../../../build/contrib/hive/ql/test/data/warehouse/union.out' "
+

[jira] [Assigned] (HIVE-24445) Non blocking DROP table implementation

2020-11-30 Thread Zoltan Chovan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Chovan reassigned HIVE-24445:



> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517915
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 13:01
Start Date: 30/Nov/20 13:01
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532579637



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,14 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+Map dirSnapshots = null;
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(
-false), false);
+false), false, dirSnapshots);

Review comment:
   Oh, sorry. I was talking nonsense. Yes, we should not pass null in. I 
thought an empty map, but checked the code that would not get filled up either. 
There is a working example in AcidUtils.getAcidFilesForStats, you should create 
the snapshot beforehand





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517915)
Time Spent: 2h 10m  (was: 2h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517913
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:59
Start Date: 30/Nov/20 12:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532577307



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   there could be deltas with higher txnId than compaction :) 
   If we won't handle this, we might get uncleaned aborts 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517913)
Time Spent: 2h  (was: 1h 50m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517912
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:57
Start Date: 30/Nov/20 12:57
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532577307



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   there could be deltas with higher txnId than compaction :) 
   If we wan't handle this, we might get uncleaned aborts 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517912)
Time Spent: 1h 50m  (was: 1h 40m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24426?focusedWorklogId=517910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517910
 ]

ASF GitHub Bot logged work on HIVE-24426:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:56
Start Date: 30/Nov/20 12:56
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1705:
URL: https://github.com/apache/hive/pull/1705#issuecomment-735769256


   Thanx @prasanthj for the review, I have handled the review comments. Please 
have a look



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517910)
Time Spent: 0.5h  (was: 20m)

> Spark job fails with fixed LlapTaskUmbilicalServer port
> ---
>
> Key: HIVE-24426
> URL: https://issues.apache.org/jira/browse/HIVE-24426
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In case of cloud deployments, multiple executors are launched on name node, 
> and incase a fixed umbilical port is specified using 
> {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}}
> The job fails with BindException.
> {noformat}
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:605)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169)
>   at org.apache.hadoop.ipc.Server.(Server.java:3032)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122)
>   ... 26 more
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:588)
>   ... 34 more{noformat}
> To counter this, better to provide a range of ports



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517908
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:54
Start Date: 30/Nov/20 12:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532575138



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,14 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+Map dirSnapshots = null;
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(
-false), false);
+false), false, dirSnapshots);

Review comment:
   how exactly? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517908)
Time Spent: 1h 40m  (was: 1.5h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517897
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:36
Start Date: 30/Nov/20 12:36
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532564972



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   You should not, there can be aborts with higher txnId than compaction, 
this will see those, but the cleaner will never see them, so it would never 
finish its job





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517897)
Time Spent: 1.5h  (was: 1h 20m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517891
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:34
Start Date: 30/Nov/20 12:34
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532563919



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,14 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+Map dirSnapshots = null;
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(
-false), false);
+false), false, dirSnapshots);

Review comment:
   This will fill up the snapshot, so it can be used later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517891)
Time Spent: 1h 10m  (was: 1h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517892
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:34
Start Date: 30/Nov/20 12:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532556849



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,14 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+Map dirSnapshots = null;
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(
-false), false);
+false), false, dirSnapshots);

Review comment:
   What's you expectation here? dirSnapshots would be always null. I think, 
what you wanted to do is:
   `dirSnapshots = getHdfsDirSnapshots(locPath.getFileSystem(conf), locPath)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517892)
Time Spent: 1h 20m  (was: 1h 10m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517886
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:21
Start Date: 30/Nov/20 12:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532556849



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -265,13 +267,14 @@ private static String idWatermark(CompactionInfo ci) {
   }
 
   /**
-   * @return true if any files were removed
+   * @return true if the cleaner has removed all files rendered obsolete by 
compaction
*/
   private boolean removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+Map dirSnapshots = null;
 AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, 
Ref.from(
-false), false);
+false), false, dirSnapshots);

Review comment:
   What's you expectation here? dirSnapshots would be always null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517886)
Time Spent: 1h  (was: 50m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517882
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 12:12
Start Date: 30/Nov/20 12:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532552240



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   should we consider aborts as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517882)
Time Spent: 50m  (was: 40m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517827
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 10:19
Start Date: 30/Nov/20 10:19
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532486047



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +312,29 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location) throws IOException 
{
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,

Review comment:
   Good idea, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517827)
Time Spent: 40m  (was: 0.5h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517825
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 10:13
Start Date: 30/Nov/20 10:13
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r532482152



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +312,29 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location) throws IOException 
{
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,

Review comment:
   You could pass an empty dirSnapshot to the first getAcidState in 
removeFiles, that will fill up the snapshot, and you can you reuse it here, so 
it won't make a second listing on the FS





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517825)
Time Spent: 0.5h  (was: 20m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=517823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517823
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 09:57
Start Date: 30/Nov/20 09:57
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r532463117



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest 
rqst, Connection dbConn
   }
   String dbName = normalizeCase(lc.getDbname());
   String tblName = normalizeCase(lc.getTablename());
-  String partName = normalizeCase(lc.getPartitionname());
+  String partName = lc.getPartitionname();

Review comment:
   @nareshpr , do you know if partition key name (`name`=value) is already 
normalized here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517823)
Time Spent: 1h  (was: 50m)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=517817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517817
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 09:44
Start Date: 30/Nov/20 09:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r532463117



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest 
rqst, Connection dbConn
   }
   String dbName = normalizeCase(lc.getDbname());
   String tblName = normalizeCase(lc.getTablename());
-  String partName = normalizeCase(lc.getPartitionname());
+  String partName = lc.getPartitionname();

Review comment:
   @nareshpr , do you know if partition name part (=value) is already 
normalized here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517817)
Time Spent: 40m  (was: 0.5h)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=517818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517818
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 09:44
Start Date: 30/Nov/20 09:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r532463117



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest 
rqst, Connection dbConn
   }
   String dbName = normalizeCase(lc.getDbname());
   String tblName = normalizeCase(lc.getTablename());
-  String partName = normalizeCase(lc.getPartitionname());
+  String partName = lc.getPartitionname();

Review comment:
   @nareshpr , do you know if partition name part (`name`=value) is already 
normalized here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517818)
Time Spent: 50m  (was: 40m)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517811
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 09:12
Start Date: 30/Nov/20 09:12
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1716:
URL: https://github.com/apache/hive/pull/1716#issuecomment-735657907


   @pvargacl would you mind taking a look too?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517811)
Time Spent: 20m  (was: 10m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-2:
--
Labels: pull-request-available  (was: )

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=517809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517809
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 09:09
Start Date: 30/Nov/20 09:09
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1716:
URL: https://github.com/apache/hive/pull/1716


   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce _any_ user-facing change?
   
   See HIVE-2
   
   ### How was this patch tested?
   Unit test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517809)
Remaining Estimate: 0h
Time Spent: 10m

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-11-30 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-2:



> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24423) Improve DbNotificationListener Thread

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24423?focusedWorklogId=517797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517797
 ]

ASF GitHub Bot logged work on HIVE-24423:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 08:39
Start Date: 30/Nov/20 08:39
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1703:
URL: https://github.com/apache/hive/pull/1703#discussion_r532424432



##
File path: 
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##
@@ -1242,64 +1244,50 @@ private void process(NotificationEvent event, 
ListenerEvent listenerEvent) throw
   }
 
   private static class CleanerThread extends Thread {
-private RawStore rs;
+private final RawStore rs;
 private int ttl;
-private boolean shouldRun = true;
 private long sleepTime;
 
 CleanerThread(Configuration conf, RawStore rs) {
   super("DB-Notification-Cleaner");
-  this.rs = rs;
-  boolean isReplEnabled = MetastoreConf.getBoolVar(conf, 
ConfVars.REPLCMENABLED);
-  if(isReplEnabled){
-setTimeToLive(MetastoreConf.getTimeVar(conf, 
ConfVars.REPL_EVENT_DB_LISTENER_TTL,
-TimeUnit.SECONDS));
-  }
-  else {
-setTimeToLive(MetastoreConf.getTimeVar(conf, 
ConfVars.EVENT_DB_LISTENER_TTL,
-TimeUnit.SECONDS));
-  }
-  setCleanupInterval(MetastoreConf.getTimeVar(conf, 
ConfVars.EVENT_DB_LISTENER_CLEAN_INTERVAL,
-  TimeUnit.MILLISECONDS));
   setDaemon(true);
+  this.rs = Objects.requireNonNull(rs);
+
+  boolean isReplEnabled = MetastoreConf.getBoolVar(conf, 
ConfVars.REPLCMENABLED);
+  ConfVars ttlConf = (isReplEnabled) ?  
ConfVars.REPL_EVENT_DB_LISTENER_TTL : ConfVars.EVENT_DB_LISTENER_TTL;
+  setTimeToLive(MetastoreConf.getTimeVar(conf, ttlConf, TimeUnit.SECONDS));
+  setCleanupInterval(
+  MetastoreConf.getTimeVar(conf, 
ConfVars.EVENT_DB_LISTENER_CLEAN_INTERVAL, TimeUnit.MILLISECONDS));
 }
 
 @Override
 public void run() {
-  while (shouldRun) {
+  while (true) {
+LOG.debug("Cleaner thread running");
 try {
   rs.cleanNotificationEvents(ttl);
   rs.cleanWriteNotificationEvents(ttl);
 } catch (Exception ex) {
-  //catching exceptions here makes sure that the thread doesn't die in 
case of unexpected
-  //exceptions
-  LOG.warn("Exception received while cleaning notifications: ", ex);
+  LOG.warn("Exception received while cleaning notifications", ex);

Review comment:
   No, go ahead and merge it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517797)
Time Spent: 50m  (was: 40m)

> Improve DbNotificationListener Thread
> -
>
> Key: HIVE-24423
> URL: https://issues.apache.org/jira/browse/HIVE-24423
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Clean up and simplify {{DbNotificationListener}} thread class.
> Most importantly, stop the thread and wait for it to finish before launching 
> a new thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Issue Comment Deleted] (HIVE-24437) Add more removed configs for(Don't fail config validation for removed configs)

2020-11-30 Thread JiangZhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangZhu updated HIVE-24437:

Comment: was deleted

(was: Need code review.)

> Add more removed configs for(Don't fail config validation for removed configs)
> --
>
> Key: HIVE-24437
> URL: https://issues.apache.org/jira/browse/HIVE-24437
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: JiangZhu
>Assignee: JiangZhu
>Priority: Major
> Attachments: HIVE-24437.1.patch
>
>
> Add more removed configs for(HIVE-14132 Don't fail config validation for 
> removed configs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24426?focusedWorklogId=517785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517785
 ]

ASF GitHub Bot logged work on HIVE-24426:
-

Author: ASF GitHub Bot
Created on: 30/Nov/20 08:15
Start Date: 30/Nov/20 08:15
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on a change in pull request #1705:
URL: https://github.com/apache/hive/pull/1705#discussion_r532410351



##
File path: 
llap-client/src/java/org/apache/hadoop/hive/llap/tezplugins/helpers/LlapTaskUmbilicalServer.java
##
@@ -54,27 +56,54 @@
 
   public LlapTaskUmbilicalServer(Configuration conf, LlapTaskUmbilicalProtocol 
umbilical, int numHandlers) throws IOException {
 jobTokenSecretManager = new JobTokenSecretManager();
-int umbilicalPort = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT);
-if (umbilicalPort <= 0) {
-  umbilicalPort = 0;
+
+String[] portRange =
+conf.get(HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT.varname)
+.split("-");
+
+int minPort = Integer.parseInt(portRange[0]);
+boolean portFound = false;
+IOException e = null;
+if (portRange.length == 1) {
+  // Single port specified, not Range.
+  startServer(conf, umbilical, numHandlers, minPort);
+  portFound = true;
+} else {
+  int maxPort = Integer.parseInt(portRange[1]);
+  for (int i = minPort; i < maxPort; i++) {
+try {
+  startServer(conf, umbilical, numHandlers, i);
+  portFound = true;
+  break;
+} catch (BindException be) {
+  // Ignore and move ahead, in search of a free port.

Review comment:
   Log at warn level to say which port is being tried and what error 
message received for debugging. 

##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
##
@@ -257,23 +259,32 @@ protected void startRpcServer() {
 
   int numHandlers =
   HiveConf.getIntVar(conf, 
ConfVars.LLAP_TASK_COMMUNICATOR_LISTENER_THREAD_COUNT);
-  int umbilicalPort = HiveConf.getIntVar(conf, 
ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT);
-  if (umbilicalPort <= 0) {
-umbilicalPort = 0;
+  String[] portRange =
+  conf.get(HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT.varname)
+  .split("-");
+  boolean portFound = false;
+  IOException ioe = null;
+  int minPort = Integer.parseInt(portRange[0]);
+  if (portRange.length == 1) {
+// Single port specified, not range.
+startServerInternal(conf, minPort, numHandlers, jobTokenSecretManager);
+portFound = true;
+  } else {
+int maxPort = Integer.parseInt(portRange[1]);
+for (int i = minPort; i < maxPort; i++) {
+  try {
+startServerInternal(conf, i, numHandlers, jobTokenSecretManager);
+portFound = true;
+break;
+  } catch (BindException be) {
+// Ignore and move ahead, in search of a free port.

Review comment:
   same here for logging

##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
##
@@ -283,6 +294,23 @@ protected void startRpcServer() {
 }
   }
 
+  private void startServerInternal(Configuration conf, int umbilicalPort,
+  int numHandlers, JobTokenSecretManager jobTokenSecretManager)
+  throws IOException {
+server = new RPC.Builder(conf).setProtocol(LlapTaskUmbilicalProtocol.class)
+
.setBindAddress("0.0.0.0").setPort(umbilicalPort).setInstance(umbilical)
+.setNumHandlers(numHandlers).setSecretManager(jobTokenSecretManager)
+.build();
+
+if (conf

Review comment:
   nit: same here to move this to private variable. 

##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
##
@@ -257,23 +259,32 @@ protected void startRpcServer() {
 
   int numHandlers =
   HiveConf.getIntVar(conf, 
ConfVars.LLAP_TASK_COMMUNICATOR_LISTENER_THREAD_COUNT);
-  int umbilicalPort = HiveConf.getIntVar(conf, 
ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT);
-  if (umbilicalPort <= 0) {
-umbilicalPort = 0;
+  String[] portRange =
+  conf.get(HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT.varname)
+  .split("-");
+  boolean portFound = false;
+  IOException ioe = null;
+  int minPort = Integer.parseInt(portRange[0]);
+  if (portRange.length == 1) {
+// Single port specified, not range.
+startServerInternal(conf, minPort, numHandlers, jobTokenSecretManager);
+portFound = true;
+  } else {
+int maxPort = Integer.parseInt(portRange[1]);

Review comment:
   same here (use RangeValidator)

##
File path:

[jira] [Commented] (HIVE-24437) Add more removed configs for(Don't fail config validation for removed configs)

2020-11-30 Thread JiangZhu (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240548#comment-17240548
 ] 

JiangZhu commented on HIVE-24437:
-

Need code review.

> Add more removed configs for(Don't fail config validation for removed configs)
> --
>
> Key: HIVE-24437
> URL: https://issues.apache.org/jira/browse/HIVE-24437
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: JiangZhu
>Assignee: JiangZhu
>Priority: Major
> Attachments: HIVE-24437.1.patch
>
>
> Add more removed configs for(HIVE-14132 Don't fail config validation for 
> removed configs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24437) Add more removed configs for(Don't fail config validation for removed configs)

2020-11-30 Thread JiangZhu (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240547#comment-17240547
 ] 

JiangZhu commented on HIVE-24437:
-

Need code review.

> Add more removed configs for(Don't fail config validation for removed configs)
> --
>
> Key: HIVE-24437
> URL: https://issues.apache.org/jira/browse/HIVE-24437
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: JiangZhu
>Assignee: JiangZhu
>Priority: Major
> Attachments: HIVE-24437.1.patch
>
>
> Add more removed configs for(HIVE-14132 Don't fail config validation for 
> removed configs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

88 matches

Mail list logo