[jira] [Commented] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.

2023-03-28 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17706073#comment-17706073
 ] 

Chris Nauroth commented on HIVE-26905:
--

[~zabetak] , no worries, and thanks for the review!

> Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from 
> upgrade-acid build.
> 
>
> Key: HIVE-26905
> URL: https://issues.apache.org/jira/browse/HIVE-26905
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must, pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the current branch-3, upgrade-acid has a dependency on an old hive-exec 
> version that has a transitive dependency to 
> org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer 
> available in commonly supported Maven repositories, which causes a build 
> failure. We can safely exclude the dependency, as was originally done in 
> HIVE-25173.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26748) Prepare for Hive 3.2.0 Release

2023-01-24 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680370#comment-17680370
 ] 

Chris Nauroth commented on HIVE-26748:
--

If we're going with Hadoop 3.3.4, then that's going to switch to a custom 
shaded dependency for protobuf, so I don't expect problems there.

https://github.com/apache/hadoop/blob/rel/release-3.3.4/hadoop-project/pom.xml#L264-L268

However, note that it would be leapfrogging the master branch, which is 
currently on Hadoop 3.3.1. (That version still has the shaded dependency.)

Regarding Tez, I suppose the upgrade ends up working fine on master because the 
Tez protobuf messages get serialized/deserialized end-to-end in the context of 
a single Hive/Tez session, so the encoding is consistent even if Hive declares 
a different version dependency than Tez. There is never a case of passing one 
of these messages externally where another client might expect the earlier 
protobuf version.

To build confidence, it would be great to hear from anyone who has tested a 
full distro with the new Hive 4 alphas, which I haven't tried yet. If so, were 
there any surprises on protobuf compatibility?

> Prepare for Hive 3.2.0 Release
> --
>
> Key: HIVE-26748
> URL: https://issues.apache.org/jira/browse/HIVE-26748
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: hive-3.2.0-must
>
> This is the Umbrella Jira to track all the commits that would go on top of 
> current branch-3 in this new 3.2.0 Hive release. I will add all the JIRAs 
> that will be cherry picked as part of this commit by defining subtasks or 
> linking the JIRAs.
>  
> *Please note that this is an Open forum and I welcome all responses for the 
> same from the community with regards to any new bug fixes that should be 
> cherry picked.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26892) Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.

2023-01-10 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656725#comment-17656725
 ] 

Chris Nauroth commented on HIVE-26892:
--

[~abstractdog], thank you for the commit, and no worries on the commit message.

> Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.
> --
>
> Key: HIVE-26892
> URL: https://issues.apache.org/jira/browse/HIVE-26892
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Critical
>  Labels: hive-3.2.0-must, pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On branch-3, we've seen a failure in {{TestArrowColumnarBatchSerDe}} while 
> trying to serialize a row of null values. It fails while trying to serialize 
> the fields of a null struct. This was fixed in 4.0 by HIVE-25243. This issue 
> tracks a backport to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26869) Backport of HIVE-19104: When test MetaStore is started with retry the instances should be independent

2023-01-08 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655847#comment-17655847
 ] 

Chris Nauroth commented on HIVE-26869:
--

[~abstractdog], thank you for reviewing and committing this.

> Backport of HIVE-19104: When test MetaStore is started with retry the 
> instances should be independent
> -
>
> Key: HIVE-26869
> URL: https://issues.apache.org/jira/browse/HIVE-26869
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This fixes TestHS2ImpersonationWithRemoteMS which was failing with the 
> following error :
> {code:java}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 703.357 s <<< FAILURE! - in 
> org.apache.hive.service.TestHS2ImpersonationWithRemoteMS
> [ERROR] 
> testImpersonation(org.apache.hive.service.TestHS2ImpersonationWithRemoteMS)  
> Time elapsed: 668.923 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected table directory '34015' in warehouse
> at org.junit.Assert.fail(Assert.java:88)
> at 
> org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation(TestHS2ImpersonationWithRemoteMS.java:115)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26892) Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.

2023-01-06 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655548#comment-17655548
 ] 

Chris Nauroth commented on HIVE-26892:
--

Hello [~brahmareddy]. This backport was spun off from HIVE-26840 (Netty and 
Arrow upgrades) when we identified an additional fix needed even after the 
upgrades:

https://github.com/apache/hive/pull/3859#issuecomment-1366907555

> Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.
> --
>
> Key: HIVE-26892
> URL: https://issues.apache.org/jira/browse/HIVE-26892
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Critical
>  Labels: hive-3.2.0-must, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On branch-3, we've seen a failure in {{TestArrowColumnarBatchSerDe}} while 
> trying to serialize a row of null values. It fails while trying to serialize 
> the fields of a null struct. This was fixed in 4.0 by HIVE-25243. This issue 
> tracks a backport to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26910) Backport HIVE-19104: Use independent warehouse directories in test metastores.

2023-01-05 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HIVE-26910.
--
Resolution: Duplicate

> Backport HIVE-19104: Use independent warehouse directories in test metastores.
> --
>
> Key: HIVE-26910
> URL: https://issues.apache.org/jira/browse/HIVE-26910
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{TestHS2ImpersonationWithRemoteMS}} fails on branch-3. It makes assertions 
> about the state of the warehouse directory, but it doesn't account for a part 
> of metastore initialization that updates the warehouse directory to 
> parameterize it by port number for test isolation.
> {{MetaStoreTestUtils#startMetaStoreWithRetry}} sets the warehouse directory 
> as the new {{metastore.warehouse.dir}} property. 
> {{AbstractHiveService#get/setWareHouseDir}} later works with the deprecated 
> {{hive.metastore.warehouse.dir}} property. {{MetastoreConf}} will take care 
> of resolving requests for the new property to values under the old property, 
> but not vice versa.
> On master, HIVE-19104 included an additional line in {{MiniHs2}} to make sure 
> these 2 properties would stay in sync for test runs. This issue tracks a 
> slightly modified backport of that patch to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26910) Backport HIVE-19104: Use independent warehouse directories in test metastores.

2023-01-05 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26910:
-
Parent: HIVE-26836
Issue Type: Sub-task  (was: Bug)

> Backport HIVE-19104: Use independent warehouse directories in test metastores.
> --
>
> Key: HIVE-26910
> URL: https://issues.apache.org/jira/browse/HIVE-26910
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must
>
> {{TestHS2ImpersonationWithRemoteMS}} fails on branch-3. It makes assertions 
> about the state of the warehouse directory, but it doesn't account for a part 
> of metastore initialization that updates the warehouse directory to 
> parameterize it by port number for test isolation.
> {{MetaStoreTestUtils#startMetaStoreWithRetry}} sets the warehouse directory 
> as the new {{metastore.warehouse.dir}} property. 
> {{AbstractHiveService#get/setWareHouseDir}} later works with the deprecated 
> {{hive.metastore.warehouse.dir}} property. {{MetastoreConf}} will take care 
> of resolving requests for the new property to values under the old property, 
> but not vice versa.
> On master, HIVE-19104 included an additional line in {{MiniHs2}} to make sure 
> these 2 properties would stay in sync for test runs. This issue tracks a 
> slightly modified backport of that patch to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26910) Backport HIVE-19104: Use independent warehouse directories in test metastores.

2023-01-05 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26910:



> Backport HIVE-19104: Use independent warehouse directories in test metastores.
> --
>
> Key: HIVE-26910
> URL: https://issues.apache.org/jira/browse/HIVE-26910
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must
>
> {{TestHS2ImpersonationWithRemoteMS}} fails on branch-3. It makes assertions 
> about the state of the warehouse directory, but it doesn't account for a part 
> of metastore initialization that updates the warehouse directory to 
> parameterize it by port number for test isolation.
> {{MetaStoreTestUtils#startMetaStoreWithRetry}} sets the warehouse directory 
> as the new {{metastore.warehouse.dir}} property. 
> {{AbstractHiveService#get/setWareHouseDir}} later works with the deprecated 
> {{hive.metastore.warehouse.dir}} property. {{MetastoreConf}} will take care 
> of resolving requests for the new property to values under the old property, 
> but not vice versa.
> On master, HIVE-19104 included an additional line in {{MiniHs2}} to make sure 
> these 2 properties would stay in sync for test runs. This issue tracks a 
> slightly modified backport of that patch to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.

2023-01-04 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26905:



> Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from 
> upgrade-acid build.
> 
>
> Key: HIVE-26905
> URL: https://issues.apache.org/jira/browse/HIVE-26905
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must
>
> In the current branch-3, upgrade-acid has a dependency on an old hive-exec 
> version that has a transitive dependency to 
> org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer 
> available in commonly supported Maven repositories, which causes a build 
> failure. We can safely exclude the dependency, as was originally done in 
> HIVE-25173.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26880) Upgrade Apache Directory Server to 1.5.7 for release 3.2.

2023-01-04 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654647#comment-17654647
 ] 

Chris Nauroth commented on HIVE-26880:
--

Thanks very much, [~zabetak]!

> Upgrade Apache Directory Server to 1.5.7 for release 3.2.
> -
>
> Key: HIVE-26880
> URL: https://issues.apache.org/jira/browse/HIVE-26880
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: hive-3.2.0-must, pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> branch-3 uses Apache Directory Server in some tests. It currently uses 
> version 1.5.6. This version has a transitive dependency to a SNAPSHOT, making 
> it awkward to build and release. We can upgrade to 1.5.7 to remove the 
> SNAPSHOT dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26892) Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.

2022-12-28 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26892:
-
Summary: Backport HIVE-25243 to 3.2.0: Handle nested values in null struct. 
 (was: Backport HIVE-25243: Handle nested values in null struct.)

> Backport HIVE-25243 to 3.2.0: Handle nested values in null struct.
> --
>
> Key: HIVE-26892
> URL: https://issues.apache.org/jira/browse/HIVE-26892
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> On branch-3, we've seen a failure in {{TestArrowColumnarBatchSerDe}} while 
> trying to serialize a row of null values. It fails while trying to serialize 
> the fields of a null struct. This was fixed in 4.0 by HIVE-25243. This issue 
> tracks a backport to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26892) Backport HIVE-25243: Handle nested values in null struct.

2022-12-28 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26892:



> Backport HIVE-25243: Handle nested values in null struct.
> -
>
> Key: HIVE-26892
> URL: https://issues.apache.org/jira/browse/HIVE-26892
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> On branch-3, we've seen a failure in {{TestArrowColumnarBatchSerDe}} while 
> trying to serialize a row of null values. It fails while trying to serialize 
> the fields of a null struct. This was fixed in 4.0 by HIVE-25243. This issue 
> tracks a backport to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26702) Backport HIVE-17317 (DBCP and HikariCP property configuration support) to 3.2.0.

2022-12-28 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652545#comment-17652545
 ] 

Chris Nauroth commented on HIVE-26702:
--

Thank you, [~ayushtkn]!

> Backport HIVE-17317 (DBCP and HikariCP property configuration support) to 
> 3.2.0.
> 
>
> Key: HIVE-26702
> URL: https://issues.apache.org/jira/browse/HIVE-26702
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HIVE-17317 added support for metastore connection pooling configuration, 
> integration with DBCP and an important capability to the HikariCP 
> integration: passthrough configuration of any additional HikariCP 
> configurations, such as {{{}minimumIdle{}}}. This issue proposes to backport 
> HIVE-17317 for inclusion in the upcoming 3.2.0 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26886) Backport of HIVE-23621 Enforce ASF headers on source files

2022-12-27 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652336#comment-17652336
 ] 

Chris Nauroth commented on HIVE-26886:
--

>From a quick scan of apache-rat-plugin output on branch-3, there are multiple 
>files flagged as missing license headers, even more than what we discussed in 
>code review of HIVE-26879. This might need multiple backports to get a clean 
>run.

> Backport of HIVE-23621 Enforce ASF headers on source files
> --
>
> Key: HIVE-26886
> URL: https://issues.apache.org/jira/browse/HIVE-26886
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Critical
>
> Cherry pick this commit to branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26880) Upgrade Apache Directory Server to 1.5.7 for release 3.2.

2022-12-20 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26880:



> Upgrade Apache Directory Server to 1.5.7 for release 3.2.
> -
>
> Key: HIVE-26880
> URL: https://issues.apache.org/jira/browse/HIVE-26880
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>
> branch-3 uses Apache Directory Server in some tests. It currently uses 
> version 1.5.6. This version has a transitive dependency to a SNAPSHOT, making 
> it awkward to build and release. We can upgrade to 1.5.7 to remove the 
> SNAPSHOT dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26637) Test error in TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable()

2022-12-15 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26637:
-
 Target Version/s: 3.2.0  (was: 3.1.2, 4.0.0-alpha-1)
Affects Version/s: (was: All Versions)
  Summary: Test error in 
TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable()  (was: 
package failed without -DskipTests=true)

> Test error in 
> TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable()
> --
>
> Key: HIVE-26637
> URL: https://issues.apache.org/jira/browse/HIVE-26637
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: happyziqi2
>Assignee: happyziqi2
>Priority: Trivial
> Attachments: Hive-26637-1.patch, Hive-26637.patch
>
>
> mvn package failed
> Test  testInsertOverwriteForPartitionedMmTable() in 
> org.apache.hadoop.hive.ql.TestTxnCommandsForMmTable
> fail



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26637) Test error in TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable()

2022-12-15 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26637:
-
Component/s: Test
 (was: Hive)

> Test error in 
> TestTxnCommandsForMmTable#testInsertOverwriteForPartitionedMmTable()
> --
>
> Key: HIVE-26637
> URL: https://issues.apache.org/jira/browse/HIVE-26637
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 3.1.2
>Reporter: happyziqi2
>Assignee: happyziqi2
>Priority: Trivial
> Attachments: Hive-26637-1.patch, Hive-26637.patch
>
>
> mvn package failed
> Test  testInsertOverwriteForPartitionedMmTable() in 
> org.apache.hadoop.hive.ql.TestTxnCommandsForMmTable
> fail



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26637) package failed without -DskipTests=true

2022-12-15 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648252#comment-17648252
 ] 

Chris Nauroth commented on HIVE-26637:
--

Hello [~happyziqi2]. It looks like this applies to branch-3 and branch-3.1, but 
not the master branch. The change looks right to me. If you open a GitHub pull 
request against branch-3, we can trigger CI tests and complete review there. 
Thank you!

> package failed without -DskipTests=true
> ---
>
> Key: HIVE-26637
> URL: https://issues.apache.org/jira/browse/HIVE-26637
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions, 3.1.2
>Reporter: happyziqi2
>Assignee: happyziqi2
>Priority: Trivial
> Attachments: Hive-26637-1.patch, Hive-26637.patch
>
>
> mvn package failed
> Test  testInsertOverwriteForPartitionedMmTable() in 
> org.apache.hadoop.hive.ql.TestTxnCommandsForMmTable
> fail



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26843) Filter all dependency module descriptors from shaded jars.

2022-12-14 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647661#comment-17647661
 ] 

Chris Nauroth commented on HIVE-26843:
--

Thank you, [~zabetak]!

> Filter all dependency module descriptors from shaded jars.
> --
>
> Key: HIVE-26843
> URL: https://issues.apache.org/jira/browse/HIVE-26843
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Hive, JDBC
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-26813 upgraded HikariCP from 2.6.1 to 4.0.3. During review of 
> [PR#3839|https://github.com/apache/hive/pull/3839], we discussed the need to 
> omit its module descriptor (module-info.class) from shaded jars. However, it 
> turns out there are also existing instances of module-info.class files from 
> other dependencies like Jackson and Log4J leaking into the shaded jars. We 
> can update the shading filters with wildcards to exclude these and also make 
> it future-proof against any other dependencies that start including a module 
> descriptor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26843) Filter all dependency module descriptors from shaded jars.

2022-12-13 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26843:



> Filter all dependency module descriptors from shaded jars.
> --
>
> Key: HIVE-26843
> URL: https://issues.apache.org/jira/browse/HIVE-26843
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Hive, JDBC
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> HIVE-26813 upgraded HikariCP from 2.6.1 to 4.0.3. During review of 
> [PR#3839|https://github.com/apache/hive/pull/3839], we discussed the need to 
> omit its module descriptor (module-info.class) from shaded jars. However, it 
> turns out there are also existing instances of module-info.class files from 
> other dependencies like Jackson and Log4J leaking into the shaded jars. We 
> can update the shading filters with wildcards to exclude these and also make 
> it future-proof against any other dependencies that start including a module 
> descriptor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26834) Hive Iceberg Storage Handler tests are ignored

2022-12-13 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646744#comment-17646744
 ] 

Chris Nauroth commented on HIVE-26834:
--

[~InvisibleProgrammer] , LOL, we have all been there. :D

> Hive Iceberg Storage Handler tests are ignored
> --
>
> Key: HIVE-26834
> URL: https://issues.apache.org/jira/browse/HIVE-26834
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, Iceberg integration
>Reporter: Zsolt Miskolczi
>Priority: Critical
>
> I wanted to run the following test locally: `mvn test 
> -Dtest="TestHiveIcebergStorageHandlerNoScan#testIcebergAndHmsTableProperties"`
> And it was just skipped. 
> I have checked the latest run on the CI server and it ignored it as well: 
> Link: 
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/1527/artifacts/
> ```
> [2022-12-12T09:11:50.841Z] [INFO] Running 
> org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerNoScan
> [2022-12-12T09:11:50.886Z] [INFO] No tests to run.
> ```
> Additional info about this class: 
> The class is annotated as a Parameterised test. But I see no usage of any 
> parameters at the test cases. I suppose it is a left over. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26834) Hive Iceberg Storage Handler tests are ignored

2022-12-12 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646336#comment-17646336
 ] 

Chris Nauroth commented on HIVE-26834:
--

Hello [~InvisibleProgrammer]. Regarding the {{Parameterized}} annotation, I do 
see some test parameterization on table type here:

[https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java#L143]

I think your {{mvn test}} example would not trigger any of the tests in the 
suite, because the parameterized runner results in tests with different naming 
suffixes appended to the test method name. For example, if I run it this way, I 
see it runs 4 tests, one for each table type: {{{}mvn test 
-Dtest='TestHiveIcebergStorageHandlerNoScan#testIcebergAndHmsTableProperties[*]'{}}}.

> Hive Iceberg Storage Handler tests are ignored
> --
>
> Key: HIVE-26834
> URL: https://issues.apache.org/jira/browse/HIVE-26834
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, Iceberg integration
>Reporter: Zsolt Miskolczi
>Priority: Critical
>
> I wanted to run the following test locally: `mvn test 
> -Dtest="TestHiveIcebergStorageHandlerNoScan#testIcebergAndHmsTableProperties"`
> And it was just skipped. 
> I have checked the latest run on the CI server and it ignored it as well: 
> Link: 
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/1527/artifacts/
> ```
> [2022-12-12T09:11:50.841Z] [INFO] Running 
> org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerNoScan
> [2022-12-12T09:11:50.886Z] [INFO] No tests to run.
> ```
> Additional info about this class: 
> The class is annotated as a Parameterised test. But I see no usage of any 
> parameters at the test cases. I suppose it is a left over. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26813) Upgrade HikariCP from 2.6.1 to 4.0.3.

2022-12-09 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645438#comment-17645438
 ] 

Chris Nauroth commented on HIVE-26813:
--

Great, thank you for the commit [~zabetak], and thank you to all of the 
reviewers!

> Upgrade HikariCP from 2.6.1 to 4.0.3.
> -
>
> Key: HIVE-26813
> URL: https://issues.apache.org/jira/browse/HIVE-26813
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Hive Metastore currently integrates with HikariCP 2.6.1 for database 
> connection pooling. This version was released in 2017. The most recent Java 
> 8-compatible release is 4.0.3, released earlier this year. This bug proposes 
> to upgrade so that we can include the past few years of development and bug 
> fixes in the 4.0.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26813) Upgrade HikariCP from 2.6.1 to 4.0.3.

2022-12-06 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26813:



> Upgrade HikariCP from 2.6.1 to 4.0.3.
> -
>
> Key: HIVE-26813
> URL: https://issues.apache.org/jira/browse/HIVE-26813
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> The Hive Metastore currently integrates with HikariCP 2.6.1 for database 
> connection pooling. This version was released in 2017. The most recent Java 
> 8-compatible release is 4.0.3, released earlier this year. This bug proposes 
> to upgrade so that we can include the past few years of development and bug 
> fixes in the 4.0.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26702) Backport HIVE-17317 (DBCP and HikariCP property configuration support) to 3.2.0.

2022-11-29 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26702:
-
Description: HIVE-17317 added support for metastore connection pooling 
configuration, integration with DBCP and an important capability to the 
HikariCP integration: passthrough configuration of any additional HikariCP 
configurations, such as {{{}minimumIdle{}}}. This issue proposes to backport 
HIVE-17317 for inclusion in the upcoming 3.2.0 release.  (was: HIVE-17315 added 
support for more flexible metastore connection pooling configuration, 
integration with DBCP and an important capability to the HikariCP integration: 
passthrough configuration of any additional HikariCP configurations, such as 
{{{}minimumIdle{}}}. This issue proposes to backport all 5 sub-tasks of 
HIVE-17315 for inclusion in the upcoming 3.2.0 release.)
Summary: Backport HIVE-17317 (DBCP and HikariCP property configuration 
support) to 3.2.0.  (was: Backport HIVE-17315 (more flexible metastore database 
connection pooling) to 3.2.0.)

On further investigation, the only patch still missing from branch-3 is 
HIVE-17317. I updated the title and description accordingly.

> Backport HIVE-17317 (DBCP and HikariCP property configuration support) to 
> 3.2.0.
> 
>
> Key: HIVE-26702
> URL: https://issues.apache.org/jira/browse/HIVE-26702
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 3.2.0
>
>
> HIVE-17317 added support for metastore connection pooling configuration, 
> integration with DBCP and an important capability to the HikariCP 
> integration: passthrough configuration of any additional HikariCP 
> configurations, such as {{{}minimumIdle{}}}. This issue proposes to backport 
> HIVE-17317 for inclusion in the upcoming 3.2.0 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26712) HCatMapReduceTest writes test files in project base directory instead of build directory.

2022-11-18 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636053#comment-17636053
 ] 

Chris Nauroth commented on HIVE-26712:
--

[~abstractdog] and [~ayushsaxena], thanks for the review and commit!

> HCatMapReduceTest writes test files in project base directory instead of 
> build directory.
> -
>
> Key: HIVE-26712
> URL: https://issues.apache.org/jira/browse/HIVE-26712
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Subclasses of {{HCatMapReduceTest}} produce files under {{hcatalog/core}}. 
> This causes a few minor irritations:
> # There is a separate {{.gitignore}} maintained just for the sake of these 
> files.
> # They are not removed by an {{mvn clean}}.
> # During release verification, while doing multiple {{mvn}} runs, the extra 
> files cause the RAT check to fail.
> This can be fixed by moving the files under {{hcatalog/core/target}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26677) Constrain available processors to Jetty during test runs to prevent thread exhaustion.

2022-11-13 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17633421#comment-17633421
 ] 

Chris Nauroth commented on HIVE-26677:
--

[~szita] , thank you!

> Constrain available processors to Jetty during test runs to prevent thread 
> exhaustion.
> --
>
> Key: HIVE-26677
> URL: https://issues.apache.org/jira/browse/HIVE-26677
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As described during a [release candidate 
> vote|https://lists.apache.org/thread/8qjf7x9t9v09d79hlzh712ls4zthdwrh]:
> HIVE-24484 introduced a change to limit {{hive.server2.webui.max.threads}} to 
> 4. Jetty enforces thread leasing to warn or abort if there aren't enough 
> threads available [1]. During startup, it attempts to lease a thread per NIO 
> selector [2]. By default, the number of NIO selectors to use is determined 
> based on available CPUs [3]. This is mostly a passthrough to 
> {{Runtime.availableProcessors()}} [4]. In my case, running on a machine with 
> 16 CPUs, this ended up creating more than 4 selectors, therefore requiring 
> more than 4 threads and violating the lease check. I was able to work around 
> this by passing the {{JETTY_AVAILABLE_PROCESSORS}} system property to 
> constrain the number of CPUs available to Jetty.
> Since we are intentionally constraining the pool to 4 threads during itests, 
> let's also limit {{JETTY_AVAILABLE_PROCESSORS}} in {{maven.test.jvm.args}} of 
> the root pom.xml, so that others don't run into this problem later.
> [1] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L165
> [2] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L255
> [3] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L79
> [4] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/ProcessorUtils.java#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26712) HCatMapReduceTest writes test files in project base directory instead of build directory.

2022-11-07 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26712:


Assignee: Chris Nauroth

> HCatMapReduceTest writes test files in project base directory instead of 
> build directory.
> -
>
> Key: HIVE-26712
> URL: https://issues.apache.org/jira/browse/HIVE-26712
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>
> Subclasses of {{HCatMapReduceTest}} produce files under {{hcatalog/core}}. 
> This causes a few minor irritations:
> # There is a separate {{.gitignore}} maintained just for the sake of these 
> files.
> # They are not removed by an {{mvn clean}}.
> # During release verification, while doing multiple {{mvn}} runs, the extra 
> files cause the RAT check to fail.
> This can be fixed by moving the files under {{hcatalog/core/target}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26702) Backport HIVE-17315 (more flexible metastore database connection pooling) to 3.2.0.

2022-11-03 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26702:
-
Description: HIVE-17315 added support for more flexible metastore 
connection pooling configuration, integration with DBCP and an important 
capability to the HikariCP integration: passthrough configuration of any 
additional HikariCP configurations, such as {{{}minimumIdle{}}}. This issue 
proposes to backport all 5 sub-tasks of HIVE-17315 for inclusion in the 
upcoming 3.2.0 release.  (was: HIVE-17317 added support for using DBCP as the 
metastore's database connection pooling implementation. It also added an 
important capability to the HikariCP integration: passthrough configuration of 
any additional HikariCP configurations, such as {{{}minimumIdle{}}}. This issue 
proposes to backport this for the upcoming 3.2.0 release.)
Summary: Backport HIVE-17315 (more flexible metastore database 
connection pooling) to 3.2.0.  (was: Backport HIVE-17317 (metastore DBCP and 
HikariCP configuration support) to 3.x.)

> Backport HIVE-17315 (more flexible metastore database connection pooling) to 
> 3.2.0.
> ---
>
> Key: HIVE-26702
> URL: https://issues.apache.org/jira/browse/HIVE-26702
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> HIVE-17315 added support for more flexible metastore connection pooling 
> configuration, integration with DBCP and an important capability to the 
> HikariCP integration: passthrough configuration of any additional HikariCP 
> configurations, such as {{{}minimumIdle{}}}. This issue proposes to backport 
> all 5 sub-tasks of HIVE-17315 for inclusion in the upcoming 3.2.0 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26702) Backport HIVE-17317 (metastore DBCP and HikariCP configuration support) to 3.x.

2022-11-03 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26702:



> Backport HIVE-17317 (metastore DBCP and HikariCP configuration support) to 
> 3.x.
> ---
>
> Key: HIVE-26702
> URL: https://issues.apache.org/jira/browse/HIVE-26702
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> HIVE-17317 added support for using DBCP as the metastore's database 
> connection pooling implementation. It also added an important capability to 
> the HikariCP integration: passthrough configuration of any additional 
> HikariCP configurations, such as {{{}minimumIdle{}}}. This issue proposes to 
> backport this for the upcoming 3.2.0 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26684) Upgrade maven-shade-plugin from 3.4.1 for bug fixes.

2022-11-03 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628426#comment-17628426
 ] 

Chris Nauroth commented on HIVE-26684:
--

Thank you, [~ngangam] and [~ayushtkn] !

> Upgrade maven-shade-plugin from 3.4.1 for bug fixes.
> 
>
> Key: HIVE-26684
> URL: https://issues.apache.org/jira/browse/HIVE-26684
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Hive build currently runs with maven-shade-plugin version 3.1.1, released 
> in April 2018. This issue proposes to upgrade to the latest version, 3.4.1, 
> released in October 2022. See HIVE-26648 for an example of another patch that 
> is blocked due to a bug in version 3.1.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-13288) Confusing exception message in DagUtils.localizeResource

2022-11-03 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628425#comment-17628425
 ] 

Chris Nauroth commented on HIVE-13288:
--

Thank you, [~rbalamohan] and [~abstractdog] !

> Confusing exception message in DagUtils.localizeResource
> 
>
> Key: HIVE-13288
> URL: https://issues.apache.org/jira/browse/HIVE-13288
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Jeff Zhang
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I got the following exception when query through hive server2. And check the 
> source code, it it due to some error when copying data from local to hdfs. 
> But the IOException is ignored and assume that it is due to another thread is 
> also writing. I don't think it make sense to assume that, at least should log 
> the IOException. 
> {code}
> LOG.info("Localizing resource because it does not exist: " + src + " to dest: 
> " + dest);
>   try {
> destFS.copyFromLocalFile(false, false, src, dest);
>   } catch (IOException e) {
> LOG.info("Looks like another thread is writing the same file will 
> wait.");
> int waitAttempts =
> 
> conf.getInt(HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.varname,
> 
> HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.defaultIntVal);
> long sleepInterval = HiveConf.getTimeVar(
> conf, HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_WAIT_INTERVAL,
> TimeUnit.MILLISECONDS);
> LOG.info("Number of wait attempts: " + waitAttempts + ". Wait 
> interval: "
> + sleepInterval);
> boolean found = false;
> {code}
> {noformat}
> 2016-03-15 11:25:39,921 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:getHiveJarDirectory(876)) - Jar dir is 
> null/directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/jeff/.hiveJars
> 2016-03-15 11:25:40,058 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(952)) - Localizing resource 
> because it does not exist: 
> file:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar to dest: 
> hdfs://sandbox.hortonworks.com:8020/user/jeff/.hiveJars/hive-exec-1.2.1.2.3.2.0-2950-a97c953db414a4f792d868e2b0417578a61ccfa368048016926117b641b07f34.jar
> 2016-03-15 11:25:40,063 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(956)) - Looks like another 
> thread is writing the same file will wait.
> 2016-03-15 11:25:40,064 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(963)) - Number of wait attempts: 
> 5. Wait interval: 5000
> 2016-03-15 11:25:53,548 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(294)) - Client 
> protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
> 2016-03-15 11:25:53,548 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Shutting down 
> the object store...
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
> ugi=hive/sandbox.hortonworks@example.com   ip=unknown-ip-addr  
> cmd=Shutting down the object store...
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Metastore 
> shutdown complete.
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
> ugi=hive/sandbox.hortonworks@example.com   ip=unknown-ip-addr  
> cmd=Metastore shutdown complete.
> 2016-03-15 11:25:53,573 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created local 
> directory: /tmp/e43fbaab-a659-4331-90cb-0ea0b2098e25_resources
> 2016-03-15 11:25:53,577 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created HDFS 
> directory: /tmp/hive/ambari-qa/e43fbaab-a659-4331-90cb-0ea0b2098e25
> 2016-03-15 11:25:53,582 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created local 
> directory: /tmp/hive/e43fbaab-a659-4331-90cb-0ea0b2098e25
> 2016-03-15 11:25:53,587 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created HDFS 
> directory: 
> 

[jira] [Assigned] (HIVE-13288) Confusing exception message in DagUtils.localizeResource

2022-11-01 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-13288:


 Component/s: HiveServer2
  (was: Clients)
Target Version/s: 4.0.0-alpha-2
Assignee: Chris Nauroth

I'd like to take this issue, because I've encountered similar problems in 
production incidents. I sent in a pull request.

I'd also like to describe root cause of my production incident for anyone 
watching this issue, in case it's helpful for diagnosing any of your own future 
issues.

While creating a Tez session, {{DagUtils#localizeResource}} is responsible for 
copying the client's hive-exec.jar into HDFS ({{hive.jar.directory}}). This 
process can be triggered from multiple threads concurrently, in which case one 
thread performs the copy while the others wait, polling for arrival of the 
destination file.

If there is an {{IOException}} during this process, it's assumed that the 
thread attempting the write failed, and all others abort. No information about 
the underlying {{IOException}} is logged. Instead, the log states "previous 
writer likely failed to write." In some cases though, the {{IOException}} can 
occur on a polling thread for reasons unrelated to what happened in a writing 
thread. For example, in a production incident, the root cause was really that 
an external process had corrupted the copy of hive-exec.jar in 
{{hive.jar.directory}}, causing failure of the file length validation check in 
{{DagUtils#checkPreExisting}}. Since the logs didn't say anything about this, 
it made it much more difficult to troubleshoot.

This patch clarifies the logging by stating that a failure on the writing 
thread is just one possible reason for the error. It also logs the exception 
stack trace to make it easier to find the real root cause. This is a patch I 
ran to help recover from the production incident.

> Confusing exception message in DagUtils.localizeResource
> 
>
> Key: HIVE-13288
> URL: https://issues.apache.org/jira/browse/HIVE-13288
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Jeff Zhang
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I got the following exception when query through hive server2. And check the 
> source code, it it due to some error when copying data from local to hdfs. 
> But the IOException is ignored and assume that it is due to another thread is 
> also writing. I don't think it make sense to assume that, at least should log 
> the IOException. 
> {code}
> LOG.info("Localizing resource because it does not exist: " + src + " to dest: 
> " + dest);
>   try {
> destFS.copyFromLocalFile(false, false, src, dest);
>   } catch (IOException e) {
> LOG.info("Looks like another thread is writing the same file will 
> wait.");
> int waitAttempts =
> 
> conf.getInt(HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.varname,
> 
> HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.defaultIntVal);
> long sleepInterval = HiveConf.getTimeVar(
> conf, HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_WAIT_INTERVAL,
> TimeUnit.MILLISECONDS);
> LOG.info("Number of wait attempts: " + waitAttempts + ". Wait 
> interval: "
> + sleepInterval);
> boolean found = false;
> {code}
> {noformat}
> 2016-03-15 11:25:39,921 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:getHiveJarDirectory(876)) - Jar dir is 
> null/directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/jeff/.hiveJars
> 2016-03-15 11:25:40,058 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(952)) - Localizing resource 
> because it does not exist: 
> file:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar to dest: 
> hdfs://sandbox.hortonworks.com:8020/user/jeff/.hiveJars/hive-exec-1.2.1.2.3.2.0-2950-a97c953db414a4f792d868e2b0417578a61ccfa368048016926117b641b07f34.jar
> 2016-03-15 11:25:40,063 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(956)) - Looks like another 
> thread is writing the same file will wait.
> 2016-03-15 11:25:40,064 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(963)) - Number of wait attempts: 
> 5. Wait interval: 5000
> 2016-03-15 11:25:53,548 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(294)) - Client 
> protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
> 2016-03-15 

[jira] [Assigned] (HIVE-26668) Upgrade ORC version to 1.6.11

2022-11-01 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26668:


Assignee: Sungwoo Park

> Upgrade ORC version to 1.6.11
> -
>
> Key: HIVE-26668
> URL: https://issues.apache.org/jira/browse/HIVE-26668
> Project: Hive
>  Issue Type: Bug
>Reporter: Sungwoo Park
>Assignee: Sungwoo Park
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With ORC 1.6.9, setting hive.exec.orc.default.compress to ZSTD can generate 
> IllegalStateException (e.g., when loading ORC tables). This is fixed in 
> ORC-965.
> {code:java}
> Caused by: java.lang.IllegalStateException: Overflow detected
>   at io.airlift.compress.zstd.Util.checkState(Util.java:59)
>   at 
> io.airlift.compress.zstd.BitOutputStream.close(BitOutputStream.java:85){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26684) Upgrade maven-shade-plugin from 3.4.1 for bug fixes.

2022-10-31 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26684:



> Upgrade maven-shade-plugin from 3.4.1 for bug fixes.
> 
>
> Key: HIVE-26684
> URL: https://issues.apache.org/jira/browse/HIVE-26684
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> The Hive build currently runs with maven-shade-plugin version 3.1.1, released 
> in April 2018. This issue proposes to upgrade to the latest version, 3.4.1, 
> released in October 2022. See HIVE-26648 for an example of another patch that 
> is blocked due to a bug in version 3.1.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HIVE-26678) In the filter criteria associated with multiple tables, the filter result of the subquery by not in or in is incorrect.

2022-10-30 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HIVE-26678:
--

> In the filter criteria associated with multiple tables, the filter result of 
> the subquery by not in or in is incorrect.
> ---
>
> Key: HIVE-26678
> URL: https://issues.apache.org/jira/browse/HIVE-26678
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: lotan
>Priority: Major
>
> create testtable as follow:
> create table test101 (id string,id2 string);
> create table test102 (id string,id2 string);
> create table test103 (id string,id2 string);
> create table test104 (id string,id2 string);
> when cbo is false,run the following SQL statement:
> explain select count(1) from test101 t1 
> left join test102 t2 on t1.id=t2.id
> left join test103 t3 on t1.id=t3.id2
> where t1.id in (select s.id from test104 s)
> and t3.id2='123';
> you will see:
> The filter criteria in the right table are lost.
> The execution plan is as follows:
> +-+
> |                                               Explain                       
>                         |
> +-+
> | STAGE DEPENDENCIES:                                                         
>                         |
> |   Stage-9 is a root stage                                                   
>                         |
> |   Stage-3 depends on stages: Stage-9                                        
>                         |
> |   Stage-0 depends on stages: Stage-3                                        
>                         |
> |                                                                             
>                         |
> | STAGE PLANS:                                                                
>                         |
> |   Stage: Stage-9                                                            
>                         |
> |     Map Reduce Local Work                                                   
>                         |
> |       Alias -> Map Local Tables:                                            
>                         |
> |         sq_1:s                                                              
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t2                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t3                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |       Alias -> Map Local Operator Tree:                                     
>                         |
> |         sq_1:s                                                              
>                         |
> |           TableScan                                                         
>                         |
> |             alias: s                                                        
>                         |
> |             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE            |
> |             Filter Operator                                                 
>                         |
> |               predicate: id is not null (type: boolean)                     
>                         |
> |               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE          |
> |               Select Operator                                               
>                         |
> |                 expressions: id (type: string)                              
>                         |
> |                 outputColumnNames: _col0                                    
>                         |
> |                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE        |
> |                 Group By Operator                                           
>                         |
> |          

[jira] [Updated] (HIVE-26678) In the filter criteria associated with multiple tables, the filter result of the subquery by not in or in is incorrect.

2022-10-30 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-26678:
-
Hadoop Flags:   (was: Incompatible change,Reviewed)
Release Note:   (was: About Me I’m Evens max pierrelouis the Chairman of 
the Board of Max web TV it’s on Wear TV Edit MB and it is a Haiti BUSINESS for 
Artificial Intelligence Deep learning Robotics Enterprise)
Tags:   (was: MaxwebTVLive )

> In the filter criteria associated with multiple tables, the filter result of 
> the subquery by not in or in is incorrect.
> ---
>
> Key: HIVE-26678
> URL: https://issues.apache.org/jira/browse/HIVE-26678
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: lotan
>Priority: Major
>
> create testtable as follow:
> create table test101 (id string,id2 string);
> create table test102 (id string,id2 string);
> create table test103 (id string,id2 string);
> create table test104 (id string,id2 string);
> when cbo is false,run the following SQL statement:
> explain select count(1) from test101 t1 
> left join test102 t2 on t1.id=t2.id
> left join test103 t3 on t1.id=t3.id2
> where t1.id in (select s.id from test104 s)
> and t3.id2='123';
> you will see:
> The filter criteria in the right table are lost.
> The execution plan is as follows:
> +-+
> |                                               Explain                       
>                         |
> +-+
> | STAGE DEPENDENCIES:                                                         
>                         |
> |   Stage-9 is a root stage                                                   
>                         |
> |   Stage-3 depends on stages: Stage-9                                        
>                         |
> |   Stage-0 depends on stages: Stage-3                                        
>                         |
> |                                                                             
>                         |
> | STAGE PLANS:                                                                
>                         |
> |   Stage: Stage-9                                                            
>                         |
> |     Map Reduce Local Work                                                   
>                         |
> |       Alias -> Map Local Tables:                                            
>                         |
> |         sq_1:s                                                              
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t2                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |         t3                                                                  
>                         |
> |           Fetch Operator                                                    
>                         |
> |             limit: -1                                                       
>                         |
> |       Alias -> Map Local Operator Tree:                                     
>                         |
> |         sq_1:s                                                              
>                         |
> |           TableScan                                                         
>                         |
> |             alias: s                                                        
>                         |
> |             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE            |
> |             Filter Operator                                                 
>                         |
> |               predicate: id is not null (type: boolean)                     
>                         |
> |               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE          |
> |               Select Operator                                               
>                         |
> |                 expressions: id (type: string)                              
>                         |
> |                 

[jira] [Assigned] (HIVE-26677) Constrain available processors to Jetty during test runs to prevent thread exhaustion.

2022-10-29 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26677:


Assignee: Chris Nauroth

> Constrain available processors to Jetty during test runs to prevent thread 
> exhaustion.
> --
>
> Key: HIVE-26677
> URL: https://issues.apache.org/jira/browse/HIVE-26677
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> As described during a [release candidate 
> vote|https://lists.apache.org/thread/8qjf7x9t9v09d79hlzh712ls4zthdwrh]:
> HIVE-24484 introduced a change to limit {{hive.server2.webui.max.threads}} to 
> 4. Jetty enforces thread leasing to warn or abort if there aren't enough 
> threads available [1]. During startup, it attempts to lease a thread per NIO 
> selector [2]. By default, the number of NIO selectors to use is determined 
> based on available CPUs [3]. This is mostly a passthrough to 
> {{Runtime.availableProcessors()}} [4]. In my case, running on a machine with 
> 16 CPUs, this ended up creating more than 4 selectors, therefore requiring 
> more than 4 threads and violating the lease check. I was able to work around 
> this by passing the {{JETTY_AVAILABLE_PROCESSORS}} system property to 
> constrain the number of CPUs available to Jetty.
> Since we are intentionally constraining the pool to 4 threads during itests, 
> let's also limit {{JETTY_AVAILABLE_PROCESSORS}} in {{maven.test.jvm.args}} of 
> the root pom.xml, so that others don't run into this problem later.
> [1] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L165
> [2] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L255
> [3] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L79
> [4] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/ProcessorUtils.java#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26669) Hive Metastore become unresponsive

2022-10-27 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HIVE-26669.
--
  Assignee: Chris Nauroth
Resolution: Not A Problem

Hello [~sandygade].

Thank you for the full thread dump. I can see from the example thread below 
that Hikari's connection adder threads are blocked in I/O in the Oracle JDBC 
driver indefinitely. It also appears that the behavior of Hikari is to create 
new connections on a dedicated thread pool (per connection pool):

[https://github.com/brettwooldridge/HikariCP/blob/dev/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L115]

Additionally, this thread pool is hard-coded to a size of 1 thread:

[https://github.com/brettwooldridge/HikariCP/blob/dev/src/main/java/com/zaxxer/hikari/util/UtilityElf.java#L139]

This would mean that if the metastore's threads need a new database connection, 
and if the adder thread blocks indefinitely in the connection attempt, then all 
of the other threads are going to get blocked behind that. Effectively, 
HiveMetaStore won't be able to make progress for clients until after a restart, 
just like you described.

Here are some recommended next steps:
 * Like you said, this could indicate a networking error (e.g. high packet 
loss), so that's worth investigating.
 * It is suspicious that the socket connections do not timeout and report an 
error back to the caller. That would at least give an opportunity for retries 
instead of hanging the whole process. I don't know Oracle myself, but I'm 
seeing some indications online that the Oracle JDBC driver supports a 
{{CONNECT_TIMEOUT}} property. Perhaps it would help to get that into the 
connection string in hive-site.xml {{javax.jdo.option.ConnectionURL}} with a 
relatively short value, like 10-30 seconds.
 * There is also support for [Apache Commons 
DBCP|https://commons.apache.org/proper/commons-dbcp/] as the connection pool, 
as documented at [Hive Metastore Connection Pooling 
Configuration|https://cwiki.apache.org/confluence/display/hive/configuration+properties#ConfigurationProperties-HiveMetastoreConnectionPoolingConfiguration.1].
 I'm not certain, but perhaps you'd see different results with that, if it 
doesn't have the behavior of blocking new connection attempts in a single 
thread.

I'm going to close out this issue as there doesn't appear to be a Hive bug. I 
hope these suggestions help.

> Hive Metastore become unresponsive
> --
>
> Key: HIVE-26669
> URL: https://issues.apache.org/jira/browse/HIVE-26669
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0
>Reporter: Sandeep Gade
>Assignee: Chris Nauroth
>Priority: Critical
> Attachments: metastore-server1
>
>
> We are experiencing issues with Hive Metastore where it goes unresponsive. 
> Initial investigation shows thousands of thread in WAITING (parking) state as 
> shown below:
> 1java.lang.Thread.State: BLOCKED (on object monitor)
> 772java.lang.Thread.State: RUNNABLE
>   2java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  13java.lang.Thread.State: TIMED_WAITING (parking)
>   5java.lang.Thread.State: TIMED_WAITING (sleeping)
>   3java.lang.Thread.State: WAITING (on object monitor)
>   14308java.lang.Thread.State: WAITING (parking)
> ==
> Almost all of the threads are stuck at 'parking to wait for  
> <0x7f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>  
>  15 - parking to wait for  <0x7f9ad06c9c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   14288 - parking to wait for  <0x7f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   1 - parking to wait for  <0x7f9ad0a161f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0a39248> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0adb0a0> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
>   5 - parking to wait for  <0x7f9ad0b12278> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b12518> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b44878> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0cbe8f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad1318d60> (a 
> 

[jira] [Commented] (HIVE-26669) Hive Metastore become unresponsive

2022-10-26 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624727#comment-17624727
 ] 

Chris Nauroth commented on HIVE-26669:
--

Thank you for sharing the additional information. The other thread is blocked 
attempting to check out a database connection from the Hikari connection pool. 
This most likely indicates that other threads have checked out all available 
connections and are using them for long-running operations. It could be that 
you could avoid the problem by tuning up the maximum number of connections 
allowed in the pool, using hive-site.xml property 
{{datanucleus.connectionPool.maxPoolSize}}. The default value is 10, documented 
here:

https://cwiki.apache.org/confluence/display/hive/configuration+properties

Alternatively, it might mean that something else is wrong, and tuning up the 
connection pool size would just delay the problem. Consider looking in the full 
thread dump to find out what other database operations are happening. Does it 
look like those connections are hanging indefinitely? Does the database itself 
appear to be overloaded?

> Hive Metastore become unresponsive
> --
>
> Key: HIVE-26669
> URL: https://issues.apache.org/jira/browse/HIVE-26669
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0
>Reporter: Sandeep Gade
>Priority: Critical
>
> We are experiencing issues with Hive Metastore where it goes unresponsive. 
> Initial investigation shows thousands of thread in WAITING (parking) state as 
> shown below:
> 1java.lang.Thread.State: BLOCKED (on object monitor)
> 772java.lang.Thread.State: RUNNABLE
>   2java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  13java.lang.Thread.State: TIMED_WAITING (parking)
>   5java.lang.Thread.State: TIMED_WAITING (sleeping)
>   3java.lang.Thread.State: WAITING (on object monitor)
>   14308java.lang.Thread.State: WAITING (parking)
> ==
> Almost all of the threads are stuck at 'parking to wait for  
> <0x7f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>  
>  15 - parking to wait for  <0x7f9ad06c9c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   14288 - parking to wait for  <0x7f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   1 - parking to wait for  <0x7f9ad0a161f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0a39248> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0adb0a0> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
>   5 - parking to wait for  <0x7f9ad0b12278> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b12518> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b44878> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0cbe8f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad1318d60> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad1478c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   5 - parking to wait for  <0x7f9ad1494ff8> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
> ==
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x7f977bfc9800 
> nid=0x62011 waiting on condition [0x7f959d917000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> 

[jira] [Commented] (HIVE-26669) Hive Metastore become unresponsive

2022-10-26 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624650#comment-17624650
 ] 

Chris Nauroth commented on HIVE-26669:
--

It appears these threads are stuck trying to initialize the raw client, while 
trying to acquire a lock for safely updating configuration:

https://github.com/apache/hive/blob/rel/release-3.1.0/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L344

There must be some other thread already holding the lock (0x7f9ad0795c48). 
Do you have a full thread dump? Finding the thread that already holds the lock 
would be the next best step for troubleshooting. The other thread could be 
holding the lock for a long time for numerous reasons (hanging socket 
connection to the database, spinning in a loop due to some bug, etc.).

BTW, the line numbers in the stack trace don't seem to line up exactly with 
version 3.1.0, which you indicated in the Affects Version field, so I wonder if 
this is really a different version or perhaps something with custom patches.

> Hive Metastore become unresponsive
> --
>
> Key: HIVE-26669
> URL: https://issues.apache.org/jira/browse/HIVE-26669
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0
>Reporter: Sandeep Gade
>Priority: Critical
>
> We are experiencing issues with Hive Metastore where it goes unresponsive. 
> Initial investigation shows thousands of thread in WAITING (parking) state as 
> shown below:
> 1java.lang.Thread.State: BLOCKED (on object monitor)
> 772java.lang.Thread.State: RUNNABLE
>   2java.lang.Thread.State: TIMED_WAITING (on object monitor)
>  13java.lang.Thread.State: TIMED_WAITING (parking)
>   5java.lang.Thread.State: TIMED_WAITING (sleeping)
>   3java.lang.Thread.State: WAITING (on object monitor)
>   14308java.lang.Thread.State: WAITING (parking)
> ==
> Almost all of the threads are stuck at 'parking to wait for  
> <0x7f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>  
>  15 - parking to wait for  <0x7f9ad06c9c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   14288 - parking to wait for  <0x7f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   1 - parking to wait for  <0x7f9ad0a161f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0a39248> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0adb0a0> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
>   5 - parking to wait for  <0x7f9ad0b12278> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b12518> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0b44878> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad0cbe8f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad1318d60> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   1 - parking to wait for  <0x7f9ad1478c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   5 - parking to wait for  <0x7f9ad1494ff8> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
> ==
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x7f977bfc9800 
> nid=0x62011 waiting on condition [0x7f959d917000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
> 

[jira] [Commented] (HIVE-26632) Update DelegationTokenSecretManager current key ID to prevent erroneous database updates.

2022-10-26 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624614#comment-17624614
 ] 

Chris Nauroth commented on HIVE-26632:
--

[~ayushtkn] and [~dengzh], thank you both!

> Update DelegationTokenSecretManager current key ID to prevent erroneous 
> database updates.
> -
>
> Key: HIVE-26632
> URL: https://issues.apache.org/jira/browse/HIVE-26632
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While rolling a new master key, {{TokenStoreDelegationTokenSecretManager}} 
> does not update a base class member variable that tracks the current key ID. 
> This can cause situations later where it attempts to update a key using an 
> incorrect ID. This update attempt fails, even though the process had 
> successfully generated a new master key. Since it appears to be a failure 
> though, the thread immediately attempts to roll a new master key again, 
> resulting in excess database load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26632) Update DelegationTokenSecretManager current key ID to prevent erroneous database updates.

2022-10-14 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617984#comment-17617984
 ] 

Chris Nauroth commented on HIVE-26632:
--

[~yigress] , [~vinayakumarb] , FYI.

> Update DelegationTokenSecretManager current key ID to prevent erroneous 
> database updates.
> -
>
> Key: HIVE-26632
> URL: https://issues.apache.org/jira/browse/HIVE-26632
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While rolling a new master key, {{TokenStoreDelegationTokenSecretManager}} 
> does not update a base class member variable that tracks the current key ID. 
> This can cause situations later where it attempts to update a key using an 
> incorrect ID. This update attempt fails, even though the process had 
> successfully generated a new master key. Since it appears to be a failure 
> though, the thread immediately attempts to roll a new master key again, 
> resulting in excess database load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26632) Update DelegationTokenSecretManager current key ID to prevent erroneous database updates.

2022-10-14 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617980#comment-17617980
 ] 

Chris Nauroth commented on HIVE-26632:
--

Update DelegationTokenSecretManager current key ID to prevent erroneous 
database updates.

{{TokenStoreDelegationTokenSecretManager#logUpdateMasterKey}}, used in 
combination with {{DBTokenStore}}, inserts a new master key to the 
{{MASTER_KEYS}} table. The serialized {{DelegationKey}} stored in the 
{{MASTER_KEY}} column initially will contain an ID with the value of the base 
class member {{AbstractDelegationTokenSecretManager#currentId}}. For example, 
for a freshly started HiveMetaStore process, this will use a value of 1. Then, 
there is a second update performed on the database row, setting a new 
serialized {{DelegationKey}} with an ID that matches the value of the 
auto-incrementing {{KEY_ID}} column. Note that this method does not update 
{{AbstractDelegationTokenSecretManager#currentId}} for agreement with the new 
key ID.

{{TokenStoreDelegationTokenSecretManager#rollMasterKeyExt}} scans all rows in 
{{MASTER_KEYS}}. If it finds a serialized {{MASTER_KEY}} with an ID that 
matches its current value for 
{{AbstractDelegationTokenSecretManager#currentId}}, then it will update the 
database row.

We have observed a race condition while running multiple HiveMetaStore 
instances sharing the same database. The steps performed in 
{{logUpdateMasterKey}} are not transactional. It's possible that 
{{rollMasterKeyExt}} running in HiveMetaStore A scans a newly inserted row from 
{{logUpdateMasterKey}} running in HiveMetaStore B that has not had the ID 
updated to the correct value yet. If that ID matches the {{currentId}} in 
HiveMetaStore A, then it will attempt to update, but the ID in the update query 
won't match any row's {{KEY_ID}}. The update fails with an exception:

{code}
2022-08-03T00:09:48,744 ERROR [Thread[Thread-9,5,main]] 
thrift.TokenStoreDelegationTokenSecretManager: ExpiredTokenRemover thread 
received unexpected exception. 
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
NoSuchObjectException(message:No key found with keyId: 1)
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
NoSuchObjectException(message:No key found with keyId: 1)
at 
org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnTokenStore(DBTokenStore.java:170)
 ~[hive-exec-2.3.7.jar:2.3.7]
at 
org.apache.hadoop.hive.thrift.DBTokenStore.updateMasterKey(DBTokenStore.java:51)
 ~[hive-exec-2.3.7.jar:2.3.7]
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.rollMasterKeyExt(TokenStoreDelegationTokenSecretManager.java:269)
 ~[hive-exec-2.3.7.jar:2.3.7]
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager$ExpiredTokenRemover.run(TokenStoreDelegationTokenSecretManager.java:301)
 [hive-exec-2.3.7.jar:2.3.7]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: No key 
found with keyId: 1
at 
org.apache.hadoop.hive.metastore.ObjectStore.updateMasterKey(ObjectStore.java:7727)
 ~[hive-exec-2.3.7.jar:2.3.7]
{code}

When this exception happens, {{ExpiredTokenRemoverThread}} will not update 
{{lastMasterKeyUpdate}}, so it immediately tries again to create a new master 
key, causing increased database load and extraneous rows in the {{MASTER_KEYS}} 
table.

This problem can be prevented if {{logUpdateMasterKey}} also updates the base 
class {{currentId}} to the correct ID value.


> Update DelegationTokenSecretManager current key ID to prevent erroneous 
> database updates.
> -
>
> Key: HIVE-26632
> URL: https://issues.apache.org/jira/browse/HIVE-26632
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> While rolling a new master key, {{TokenStoreDelegationTokenSecretManager}} 
> does not update a base class member variable that tracks the current key ID. 
> This can cause situations later where it attempts to update a key using an 
> incorrect ID. This update attempt fails, even though the process had 
> successfully generated a new master key. Since it appears to be a failure 
> though, the thread immediately attempts to roll a new master key again, 
> resulting in excess database load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26632) Update DelegationTokenSecretManager current key ID to prevent erroneous database updates.

2022-10-14 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HIVE-26632:



> Update DelegationTokenSecretManager current key ID to prevent erroneous 
> database updates.
> -
>
> Key: HIVE-26632
> URL: https://issues.apache.org/jira/browse/HIVE-26632
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> While rolling a new master key, {{TokenStoreDelegationTokenSecretManager}} 
> does not update a base class member variable that tracks the current key ID. 
> This can cause situations later where it attempts to update a key using an 
> incorrect ID. This update attempt fails, even though the process had 
> successfully generated a new master key. Since it appears to be a failure 
> though, the thread immediately attempts to roll a new master key again, 
> resulting in excess database load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26322) Upgrade gson to 2.9.0 due to CVE

2022-07-15 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567336#comment-17567336
 ] 

Chris Nauroth commented on HIVE-26322:
--

Thank you, [~dengzh]. This issue states upgrading to GSON 2.9.0 instead of 
2.8.9. Although it isn't addressing CVEs, I'm hoping to upgrade to 2.9.0 for 
the bug fixes documented in the [release 
notes|https://github.com/google/gson/releases/tag/gson-parent-2.9.0]. Hadoop 
has done the upgrade too in HADOOP-18300.

Can we reopen this issue and get the linked pull request in?

> Upgrade gson to 2.9.0 due to CVE
> 
>
> Key: HIVE-26322
> URL: https://issues.apache.org/jira/browse/HIVE-26322
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26322) Upgrade gson to 2.9.0 due to CVE

2022-07-14 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566916#comment-17566916
 ] 

Chris Nauroth commented on HIVE-26322:
--

Hello [~dengzh]. This was resolved as duplicate, but which issue does it 
duplicate? I still see the gson version is 2.9.0 in the repo, and I'd like to 
upgrade it. Thanks!

> Upgrade gson to 2.9.0 due to CVE
> 
>
> Key: HIVE-26322
> URL: https://issues.apache.org/jira/browse/HIVE-26322
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-14050) Hive attempts to 'chgrp' files on s3a://

2016-09-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-14050:
-
Assignee: (was: Chris Nauroth)

I'm not actively working on this, so I'm unassigning.

> Hive attempts to 'chgrp' files on s3a://
> 
>
> Key: HIVE-14050
> URL: https://issues.apache.org/jira/browse/HIVE-14050
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sean Roberts
>  Labels: s3
>
> When inserting to a table on s3a://, Hive attempts to `chgrp` the files but 
> files in s3a:// do not have group ownership.
> {code}
> hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Full output of the query here:
> {code}
> hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> Query ID = admin_20160617201151_5f953fbe-acde-4774-9ad7-06cffc76dd72
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1466165341299_0011)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 8.71 s
> 
> Loading data to table mydb.inventory
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> Table mydb.inventory stats: [numFiles=12, numRows=6020352, 
> totalSize=25250706, rawDataSize=96325632]
> OK
> Time taken: 19.123 seconds
> {code}
> The table:
> {code}
> CREATE TABLE IF NOT EXISTS inventory
>(
> MONTH_ID int,
> ITEM_ID int,
> BOH_QTY float,
> EOH_QTY float
>) row format delimited fields terminated by '|' escaped by '\\' stored as 
> ORC
> LOCATION 's3a://mybucket/hive/warehouse/mydb.db/inventory'
> tblproperties ("orc.compress"="SNAPPY");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14050) Hive attempts to 'chgrp' files on s3a://

2016-09-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536067#comment-15536067
 ] 

Chris Nauroth commented on HIVE-14050:
--

Linking to HADOOP-13309, which will document the current limitations of the S3A 
ownership and permissions model, and HADOOP-13310, which might start returning 
a non-empty stub value for the group.

> Hive attempts to 'chgrp' files on s3a://
> 
>
> Key: HIVE-14050
> URL: https://issues.apache.org/jira/browse/HIVE-14050
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sean Roberts
>Assignee: Chris Nauroth
>  Labels: s3
>
> When inserting to a table on s3a://, Hive attempts to `chgrp` the files but 
> files in s3a:// do not have group ownership.
> {code}
> hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Full output of the query here:
> {code}
> hive> insert into INVENTORY select * from INVENTORY_Q1_2006;
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> Query ID = admin_20160617201151_5f953fbe-acde-4774-9ad7-06cffc76dd72
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1466165341299_0011)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 8.71 s
> 
> Loading data to table mydb.inventory
> -chgrp: '' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> Table mydb.inventory stats: [numFiles=12, numRows=6020352, 
> totalSize=25250706, rawDataSize=96325632]
> OK
> Time taken: 19.123 seconds
> {code}
> The table:
> {code}
> CREATE TABLE IF NOT EXISTS inventory
>(
> MONTH_ID int,
> ITEM_ID int,
> BOH_QTY float,
> EOH_QTY float
>) row format delimited fields terminated by '|' escaped by '\\' stored as 
> ORC
> LOCATION 's3a://mybucket/hive/warehouse/mydb.db/inventory'
> tblproperties ("orc.compress"="SNAPPY");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14739) Replace runnables directly added to runtime shutdown hooks to avoid deadlock

2016-09-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486315#comment-15486315
 ] 

Chris Nauroth commented on HIVE-14739:
--

[~prasanth_j], thank you for the updated patch.  +1 (non-binding) from me.

> Replace runnables directly added to runtime shutdown hooks to avoid deadlock
> 
>
> Key: HIVE-14739
> URL: https://issues.apache.org/jira/browse/HIVE-14739
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Deepesh Khandelwal
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14739.1.patch, HIVE-14739.2.patch
>
>
> [~deepesh] reported that a deadlock can occur when running queries through 
> hive cli. [~cnauroth] analyzed it and reported that hive adds shutdown hooks 
> directly to java Runtime which may execute in non-deterministic order causing 
> deadlocks with hadoop's shutdown hooks. In one case, hadoop shutdown locked 
> FileSystem#Cache and FileSystem.close whereas hive shutdown hook locked 
> FileSystem.close and FileSystem#Cache order causing a deadlock. 
> Hive and Hadoop has ShutdownHookManager that runs the shutdown hooks in 
> deterministic order based on priority. We should use that to avoid deadlock 
> throughout the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14739) Replace runnables directly added to runtime shutdown hooks to avoid deadlock

2016-09-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485485#comment-15485485
 ] 

Chris Nauroth commented on HIVE-14739:
--

[~prasanth_j], thank you for sharing this patch.  It's interesting for me to 
see that Hive appears to have forked its own copy of {{ShutdownHookManager}} 
from Hadoop.  I don't know the background on this.  The code is similar, but 
not identical, between the two codebases.  Perhaps that's because the Hive 
version was not updated to match recent changes in Hadoop, like HADOOP-12950.

In order to fully prevent deadlocks between different shutdown hooks, there 
really needs to be a single {{ShutdownHookManager}} in the process.  If Hadoop 
and Hive each have their own implementation, and a Hive process instantiates 
one of each and registers different shutdown hooks with each one, then there 
will be 2 threads executing different shutdown hooks concurrently, which could 
still cause a deadlock.

Would it make sense to eliminate the forked {{ShutdownHookManager}} class in 
Hive and instead rely completely on using the one from Hadoop?

Also, a minor nit: maybe all calls to {{new Thread()}} could be converted to 
{{new Runnable()}}.  The {{Runnable}} interface is sufficient, and it won't 
make use of any additional functionality provided by the {{Thread}} 
implementation.

> Replace runnables directly added to runtime shutdown hooks to avoid deadlock
> 
>
> Key: HIVE-14739
> URL: https://issues.apache.org/jira/browse/HIVE-14739
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Deepesh Khandelwal
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14739.1.patch
>
>
> [~deepesh] reported that a deadlock can occur when running queries through 
> hive cli. [~cnauroth] analyzed it and reported that hive adds shutdown hooks 
> directly to java Runtime which may execute in non-deterministic order causing 
> deadlocks with hadoop's shutdown hooks. In one case, hadoop shutdown locked 
> FileSystem#Cache and FileSystem.close whereas hive shutdown hook locked 
> FileSystem.close and FileSystem#Cache order causing a deadlock. 
> Hive and Hadoop has ShutdownHookManager that runs the shutdown hooks in 
> deterministic order based on priority. We should use that to avoid deadlock 
> throughout the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14423) S3: Fetching partition sizes from FS can be expensive when stats are not available in metastore

2016-08-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409639#comment-15409639
 ] 

Chris Nauroth commented on HIVE-14423:
--

[~rajesh.balamohan], thank you for patch 2.  This looks good to me, and I like 
the idea of optimizing getContentSummary within S3A.  My only other suggestion 
for this patch is that best practice for handling {{InterruptedException}} is 
to re-raise the interrupted flag by calling 
{{Thread.currentThread().interrupt()}}, so that any other layers of code that 
need to handle the interruption continue to work as expected.

> S3: Fetching partition sizes from FS can be expensive when stats are not 
> available in metastore 
> 
>
> Key: HIVE-14423
> URL: https://issues.apache.org/jira/browse/HIVE-14423
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14423.1.patch, HIVE-14423.2.patch
>
>
> When partition stats are not available in metastore, it tries to get the file 
> sizes from FS.
> e.g
> {noformat}
> at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1487)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getFileSizeForPartitions(StatsUtils.java:598)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:235)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:144)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:132)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:126)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> {noformat}
> This can be quite expensive in some FS like S3. Especially when table is 
> partitioned (e.g TPC-DS store_sales which has 1000s of partitions), query can 
> spend 1000s of seconds just waiting for these information to be pulled in.
> Also, it would be good to remove FS.getContentSummary usage to find out file 
> sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14423) S3: Fetching partition sizes from FS can be expensive when stats are not available in metastore

2016-08-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408281#comment-15408281
 ] 

Chris Nauroth commented on HIVE-14423:
--

Hello [~rajesh.balamohan].  Thank you for the patch.  This looks really 
valuable for S3A, WASB and other file systems backed by blob stores, but I have 
a question about whether it will change load patterns and performance 
characteristics when running on HDFS.

For HDFS, {{getContentSummary}} is a single RPC to the NameNode.  It's possibly 
the most expensive NameNode RPC, at least among the read APIs, because the 
NameNode needs to hold a lock while traversing the entire inode sub-tree.  
However, it does have the benefit of getting all of the calculation done for a 
single path/partition in a single network call, so overall, this Hive algorithm 
is O(N) where N = # partitions.

With this patch, it starts using {{FileSystem#listFiles}} with the recursive 
option, which turns into multiple {{getListing}} NameNode RPCs, one for each 
sub-directory.  The {{getListing}} RPC is less expensive for the NameNode to 
execute compared to {{getContentSummary}}, but overall this algorithm requires 
many more network round-trips: O(N * M) where N = # partitions and M = average 
# directories per partition.

At this point in the Hive code, is it possible that the partitions refer to 
directories in the file system that are multiple levels deep with nested 
sub-directories?  I suspect the answer is yes, because the existing code used 
{{getContentSummary}}, and your patch used the recursive option for 
{{listFiles}}.

Do you think an alternative approach would be to override {{getContentSummary}} 
in {{S3AFileSystem}} and optimize it?  That might look similar to other 
optimizations that are making use of S3 bulk listings, such as HADOOP-13208 and 
HADOOP-13371.

Parallelizing the calls for all partitions looks valuable regardless of which 
approach we take.

Cc [~ste...@apache.org] FYI for when he returns.

> S3: Fetching partition sizes from FS can be expensive when stats are not 
> available in metastore 
> 
>
> Key: HIVE-14423
> URL: https://issues.apache.org/jira/browse/HIVE-14423
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14423.1.patch
>
>
> When partition stats are not available in metastore, it tries to get the file 
> sizes from FS.
> e.g
> {noformat}
> at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1487)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getFileSizeForPartitions(StatsUtils.java:598)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:235)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:144)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:132)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:126)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> {noformat}
> This can be quite expensive in some FS like S3. Especially when table is 
> partitioned (e.g TPC-DS store_sales which has 1000s of partitions), query can 
> spend 1000s of seconds just waiting for these information to be pulled in.
> Also, it would be good to remove FS.getContentSummary usage to find out file 
> sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-28 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397859#comment-15397859
 ] 

Chris Nauroth commented on HIVE-14270:
--

Separating development of a richer Hive-on-S3 integration test suite into a 
separate JIRA sounds reasonable to me.  I expect the initial bootstrapping 
would be a large effort on its own.  If you'd like more details on how Hadoop 
is handling that, please feel free to notify those of us from Hadoop who do a 
lot of object store integration work when you file that new JIRA.

Steve is out for several weeks now, so I don't expect further responses from 
him for a while.

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-07-27 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396090#comment-15396090
 ] 

Chris Nauroth commented on HIVE-14270:
--

Any approach that involves matching on scheme is going to be incomplete and 
error-prone, regardless of whether the logic lives in Hive or Hadoop Common.  
Users have flexibility to define new schemes or even remap existing schemes in 
their runtime configuration by setting configuration property 
{{fs..impl}}.  In practice, it's rare, but I have seen it done.

An API or an {{instanceof}} check to identify an object store would be more 
reliable, but then there is the additional challenge of 
[ViewFs|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ViewFs.html]
 defining a client-side mount table.  In that case, there is a single 
{{FileSystem}} instance visible to the caller, but it may route different 
{{Path}} instances to HDFS vs. S3A vs. something else.  This is something else 
that is a bit rare in practice, but I know at least Twitter does it.  This 
might imply that the HADOOP-9565 API needs to be sensitive to {{Path}}, not 
only the {{FileSystem}} instance.

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14323) Reduce number of FS permissions and redundant FS operations

2016-07-26 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394202#comment-15394202
 ] 

Chris Nauroth commented on HIVE-14323:
--

+1 (non-binding) from me too.  Thank you, Rajesh.

> Reduce number of FS permissions and redundant FS operations
> ---
>
> Key: HIVE-14323
> URL: https://issues.apache.org/jira/browse/HIVE-14323
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14323.1.patch
>
>
> Some examples are given below.
> 1. When creating stage directory, FileUtils sets the directory permissions by 
> running a set of chgrp and chmod commands. In systems like S3, this would not 
> be relevant.
> 2. In some cases, fs.delete() is followed by fs.exists(). In this case, it 
> might be redundant to check for exists() (lookup ops are expensive in systems 
> like S3). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14323) Reduce number of FS permissions and redundant FS operations

2016-07-25 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392555#comment-15392555
 ] 

Chris Nauroth commented on HIVE-14323:
--

[~rajesh.balamohan], thank you for the patch.

Is the change in {{FileUtils#mkdir}} required?  It appears that the 
{{inheritPerms}} argument is already intended to capture the setting of 
{{HIVE_WAREHOUSE_SUBDIR_INHERIT_PERMS}}, so looking it up again within the 
method might be confusing.  I see some call sites pass along the value of that 
property and others hard-code it.  I see your patch is also updating some of 
those call sites to respect the configuration.  Do you think this change should 
be handled completely by updating the call sites?

{code}
-if (fs.exists(ptnPath)){
-  fs.delete(ptnPath,true);
+try {
+  fs.delete(ptnPath, true);
+} catch (IOException ioe) {
+  //ignore
 }
{code}

I think the intent here is "try the delete, and if the path doesn't exist, just 
keep going."  Catching every {{IOException}} could mask other I/O errors 
though.  Right now, exceptions would propagate out to a wider {{catch 
(Exception)}} block, where there is additional cleanup logic.  I wonder if 
catching every {{IOException}} would harm this cleanup logic.

According to the [FileSystem 
Specification|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]
 for delete, if there is a recursive delete attempted on a path that doesn't 
exist, then it fails by returning {{false}}, not throwing an exception.  There 
are contract tests that verify this behavior too.

{code}
  LOG.info("Patch..checking isEmptyPath for : " + dirPath);
{code}

Is this a leftover log statement from debugging, or is it intentional to 
include it in the patch?

I don't feel confident commenting on the logic in {{Hive#replaceFiles}}, so 
I'll defer to others more familiar with Hive to review that part.

> Reduce number of FS permissions and redundant FS operations
> ---
>
> Key: HIVE-14323
> URL: https://issues.apache.org/jira/browse/HIVE-14323
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14323.1.patch
>
>
> Some examples are given below.
> 1. When creating stage directory, FileUtils sets the directory permissions by 
> running a set of chgrp and chmod commands. In systems like S3, this would not 
> be relevant.
> 2. In some cases, fs.delete() is followed by fs.exists(). In this case, it 
> might be redundant to check for exists() (lookup ops are expensive in systems 
> like S3). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14271) FileSinkOperator should not rename files to final paths when S3 is the default destination

2016-07-22 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389835#comment-15389835
 ] 

Chris Nauroth commented on HIVE-14271:
--

If I understand correctly, then approach b) sounds like the "direct output 
committer" strategy that has been discussed in a few other contexts.  Please be 
aware that this is unsafe in the presence of certain kinds of network 
partitions.  It might be a rare case, but the consequences are distastrous: 
data loss or corruption.  For example, Spark highly discourages a direct write 
strategy.  (See SPARK-10063.)

> FileSinkOperator should not rename files to final paths when S3 is the 
> default destination
> --
>
> Key: HIVE-14271
> URL: https://issues.apache.org/jira/browse/HIVE-14271
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
>
> FileSinkOperator does a rename of {{outPaths -> finalPaths}} when it finished 
> writing all rows to a temporary path. The problem is that S3 does not support 
> renaming.
> Two options can be considered:
> a. Use a copy operation instead. After FileSinkOperator writes all rows to 
> outPaths, then the commit method will do a copy() call instead of move().
> b. Write row by row directly to the S3 path (see HIVE-1620). This may add 
> better performance calls, but we should take care of the cleanup part in case 
> of writing errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14301) insert overwrite fails for nonpartitioned tables in s3

2016-07-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388554#comment-15388554
 ] 

Chris Nauroth commented on HIVE-14301:
--

[~ayousufi], thank you for pointing out the issue with rename to root.  I'm 
going to propose that we change that behavior in S3A within scope of an issue I 
just filed: HADOOP-13402.  I think this is simply pre-validation logic that 
didn't fully consider the case of renaming to root.

I agree with Rajesh that it isn't a common case, but I'd still like to fix it 
in S3A for the sake of consistency in semantics.

> insert overwrite fails for nonpartitioned tables in s3
> --
>
> Key: HIVE-14301
> URL: https://issues.apache.org/jira/browse/HIVE-14301
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14301.1.patch
>
>
> {noformat}
> hive> insert overwrite table s3_2 select * from default.test2;
> Query ID = hrt_qa_20160719164737_90fb1f30-0ade-4a64-ab65-a6a7550be25a
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1468941549982_0010)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 11.90 s   
>  
> 
> Loading data to table default.s3_2
> Failed with exception java.io.IOException: rename for src path: 
> s3a://test-ks/test2/.hive-staging_hive_2016-07-19_16-47-37_787_4725676452829013403-1/-ext-1/00_0.deflate
>  to dest path:s3a://test-ks/test2/00_0.deflate returned false
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> 2016-07-19 16:43:46,244 ERROR [main]: exec.Task 
> (SessionState.java:printError(948)) - Failed with exception 
> java.io.IOException: rename for src path: 
> s3a://test-ks/testing/.hive-staging_hive_2016-07-19_16-42-20_739_1716954454570249450-1/-ext-1/00_0.deflate
>  to dest path:s3a://test-ks/testing/00_0.deflate returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> s3a://test-ks/testing/.hive-staging_hive_2016-07-19_16-42-20_739_1716954454570249450-1/-ext-1/00_0.deflate
>  to dest path:s3a://test-ks/testing/00_0.deflate returned false
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2856)
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3113)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1700)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:328)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1726)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1472)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1271)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1138)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1128)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: rename for src path: 
> s3a://test-ks/testing/.hive-staging_hive_2016-07-19_16-42-20_739_1716954454570249450-1/-ext-1/00_0.deflate
>  to dest 

[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-07-18 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382878#comment-15382878
 ] 

Chris Nauroth commented on HIVE-13990:
--

[~thejas], is this possibly a duplicate of HIVE-9182?  There was an uncommitted 
patch on that one.  During code review for that patch, I gave feedback that you 
can avoid {{getAclStatus}} calls by checking {{FsPermission#getAclBit}}.  For 
any {{FileSystem}} that doesn't implement ACLs, the ACL bit will always be 
false.  I expect this would work for all {{FileSystem}} implementations and 
avoid tight coupling to HDFS server-side configuration.  I think the same 
feedback applies here.

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, 
> HIVE-13990.1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13008) WebHcat DDL commands in secure mode NPE when default FileSystem doesn't support delegation tokens

2016-02-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134432#comment-15134432
 ] 

Chris Nauroth commented on HIVE-13008:
--

+1 (non-binding), pending pre-commit run.  This is the right thing to do from 
the perspective of integration with [HDFS 
Federation|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/Federation.html]
 and use of 
[ViewFs|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ViewFs.html]
 as an abstraction over multiple file systems.

> WebHcat DDL commands in secure mode NPE when default FileSystem doesn't 
> support delegation tokens
> -
>
> Key: HIVE-13008
> URL: https://issues.apache.org/jira/browse/HIVE-13008
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-13008.patch
>
>
> {noformat}
> ERROR | 11 Jan 2016 20:19:02,781 | 
> org.apache.hive.hcatalog.templeton.CatchallExceptionMapper |
> java.lang.NullPointerException
> at 
> org.apache.hive.hcatalog.templeton.SecureProxySupport$2.run(SecureProxySupport.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hive.hcatalog.templeton.SecureProxySupport.writeProxyDelegationTokens(SecureProxySupport.java:168)
> at 
> org.apache.hive.hcatalog.templeton.SecureProxySupport.open(SecureProxySupport.java:95)
> at 
> org.apache.hive.hcatalog.templeton.HcatDelegator.run(HcatDelegator.java:63)
> at org.apache.hive.hcatalog.templeton.Server.ddl(Server.java:217)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
> at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1360)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:574)
> at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:88)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1331)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477)
> at 
> 

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531961#comment-14531961
 ] 

Chris Nauroth commented on HIVE-9736:
-

I apologize for missing this in my code review.  I'm +1 (non-binding) for patch 
v7 pending a fresh test run.  I reran these tests locally and they passed, 
although they were also passing with the prior patch for me.  Mithun, thank you 
for updating the patch.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528724#comment-14528724
 ] 

Chris Nauroth commented on HIVE-9736:
-

[~sushanth], thank you for your review and the commit!

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527362#comment-14527362
 ] 

Chris Nauroth commented on HIVE-9736:
-

Just as a reminder, we were asked to check the build with {{-Phadoop-1}}.  I 
can volunteer to do that, but I think we'll need one more final revision of the 
patch intended to be committed.  I'm +1 (non-binding) for the changes shown in 
the last patch though, so if it's just a rebase, then that wouldn't change.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527506#comment-14527506
 ] 

Chris Nauroth commented on HIVE-9736:
-

I verified with both {{-Phadoop-2}} and {{-Phadoop-1}}.  Thanks again, Mithun!

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527348#comment-14527348
 ] 

Chris Nauroth commented on HIVE-9736:
-

I figured we could make {{DefaultFileAccess#combine}} public, and then 
{{Hadoop23Shims}} could call it.  hive-shims-0.23 already has a dependency on 
hive-shims-common.  However, if there is a detail that I'm missing, then I 
wouldn't intend to hold up the patch over making that change.

+1 (non-binding) from me, and I defer to you on what's best to do with 
{{combine}} right now.  Thank you for the patch, and thank you for responding 
to the code review feedback!

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10151) insert into A select from B is broken when both A and B are Acid tables and bucketed the same way

2015-05-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524506#comment-14524506
 ] 

Chris Nauroth commented on HIVE-10151:
--

This patch introduced a call to {{FileStatus#isFile}}, which is only defined in 
Hadoop 2.x, so Hive could not compile with {{-Phadoop-1}}.  I posted an 
addendum patch on HIVE-10444 to fix it.  If it's better to track it as a new 
jira separate from HIVE-10444, please let me know.  Thanks!

 insert into A select from B is broken when both A and B are Acid tables and 
 bucketed the same way
 -

 Key: HIVE-10151
 URL: https://issues.apache.org/jira/browse/HIVE-10151
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10151.patch


 BucketingSortingReduceSinkOptimizer makes 
 insert into AcidTable select * from otherAcidTable
 use BucketizedHiveInputFormat which bypasses ORC merge logic on read and 
 tries to send bucket files (rather than table dir) down to OrcInputFormat.
 (this is true only if both AcidTable and otherAcidTable are bucketed the same 
 way).  Then ORC dies.
 More specifically:
 {noformat}
 create table acidTbl(a int, b int) clustered by (a) into 2 buckets stored as 
 orc TBLPROPERTIES ('transactional'='true')
 create table acidTblPart(a int, b int) partitioned by (p string) clustered by 
 (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true')
 insert into acidTblPart partition(p=1) (a,b) values(1,2)
 insert into acidTbl(a,b) select a,b from acidTblPart where p = 1
 {noformat}
 results in 
 {noformat}
 2015-04-29 13:57:35,807 ERROR [main]: exec.Task 
 (SessionState.java:printError(956)) - Job Submission failed with exception 
 'java.lang.RuntimeException(serious problem)'
 java.lang.RuntimeException: serious problem
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
 at 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat.getSplits(BucketizedHiveInputFormat.java:141)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at 
 org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:225)
 at 
 org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn2(TestTxnCommands2.java:148)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 

[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-05-01 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Attachment: HIVE-10444.addendum.3.patch

[~prasanth_j], thank you for the commit!

bq. Build is still failing even with this patch

[~xuefuz], that one is a new problem introduced by HIVE-10151 just today after 
the pre-commit run for this patch.  I'm attaching an addendum patch to fix 
that.  If it's better to file a new jira to track the new patch, let me know, 
and I'll do that.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10444.1.patch, HIVE-10444.2.patch, 
 HIVE-10444.addendum.3.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522333#comment-14522333
 ] 

Chris Nauroth commented on HIVE-10444:
--

[~apivovarov], thank you for the review.

I reran the failed tests locally, and they all passed.

I also tried running the same tests with {{-Phadoop-1}}, and they failed due to 
a {{NoSuchMethodError}} in an HDFS class.  Looking at the test classpath, I can 
see it's picking up a 2.x version of the minicluster, even though I set 
{{-Phadoop-1}}.  I don't think this is related to the current patch.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Attachments: HIVE-10444.1.patch, HIVE-10444.2.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Assignee: Chris Nauroth  (was: Gunther Hagleitner)

Sorry for the HIVE-10223 breakage.  I'll pick this up.  [~prasanth_j], thank 
you for filing the bug report.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth

 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts

2015-04-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520152#comment-14520152
 ] 

Chris Nauroth commented on HIVE-10066:
--

FYI, this patch's call to {{FileStatus#isDirectory}} does not work when linking 
against Hadoop 1 using {{-Phadoop-1}}.  I included a fix in my patch for 
HIVE-10444, which reported a similar problem elsewhere in the code.

 Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
 -

 Key: HIVE-10066
 URL: https://issues.apache.org/jira/browse/HIVE-10066
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10066.2.patch, HIVE-10066.3.patch, HIVE-10066.patch


 From [~hitesh]:
 Tez is a client-side only component ( no daemons, etc ) and therefore it is 
 meant to be installed on the gateway box ( or where its client libraries are 
 needed by any other services’ daemons). It does not have any cluster 
 dependencies both in terms of libraries/jars as well as configs. When it runs 
 on a worker node, everything was pre-packaged and made available to the 
 worker node via the distributed cache via the client code. Hence, its 
 client-side configs are also only needed on the same (client) node as where 
 it is installed. The only other install step needed is to have the tez 
 tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which 
 points to the HDFS path. 
 We need a way to pass client jars and tez-site.xml to the LaunchMapper.
 We should create a general purpose mechanism here which can supply additional 
 artifacts per job type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Attachment: HIVE-10444.1.patch

I also found one occurrence of the same problem that was not introduced by 
HIVE-10223.  Instead, it was introduced by HIVE-10066.

I think the simplest thing to do is to revert to using {{FileStatus#isDir}}, 
which is present in Hadoop 1.2.1.  It's deprecated in 2.x in favor of 
{{FileStatus#isDirectory}}, but it's still usable.  I'm attaching a patch.

I verified a build locally for both {{-Phadoop-1}} and {{-Phadoop-2}}.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Attachments: HIVE-10444.1.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520892#comment-14520892
 ] 

Chris Nauroth commented on HIVE-9736:
-

I have one remaining comment in Review Board suggesting a possible reusable 
{{combine}} method for combining {{FsAction}} values instead of duplicating the 
logic.  Aside from that very minor thing, I'm basically +1 (non-binding) for 
the patch.  However, I still couldn't get the consolidated v5 patch to apply to 
master, so I couldn't check a build with {{-Phadoop-1}}.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Attachment: HIVE-10444.2.patch

[~apivovarov], thank you for your review.  I'm attaching patch v2.  Here it is 
in Review Board:

https://reviews.apache.org/r/33715/

Patch v1 basically just restored the logic to pre-HIVE-10223 state.  You're 
right that there was a redundant check and an unreachable block though.  There 
is no reason to maintain it this way now, so I made the correction in patch v2.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Attachments: HIVE-10444.1.patch, HIVE-10444.2.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-28 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518695#comment-14518695
 ] 

Chris Nauroth commented on HIVE-9736:
-

Hi [~mithun].  Thank you for uploading a new patch.

I was unable to apply patch v3 to the master branch.  Does it need to be 
rebased, or should I be working with a different branch?

There was one suggestion I made on Review Board that still isn't implemented.  
In {{Hadoop23Shims#checkFileAccess}}, we can combine the multiple {{actions}} 
by using {{FsAction#or}}, and then call {{accessMethod.invoke}} just once to do 
the check in a single RPC (per file).  Were you planning to make this change, 
or is there a reason you decided not to do it?

Aside from that, I can see all of my other feedback has been addressed.  Thanks 
again!

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-13 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492966#comment-14492966
 ] 

Chris Nauroth commented on HIVE-9736:
-

Thank you for the rebased patch.  It looks great to me overall.  I've entered a 
few comments in ReviewBoard for your consideration regarding consolidation of 
RPC calls and a few other minor things.

https://reviews.apache.org/r/31615/


 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl

2015-04-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490555#comment-14490555
 ] 

Chris Nauroth commented on HIVE-9182:
-

Is {{setFullFileStatus}} always called in situations where source and 
destination are on the same file system?  That looks to be true for the call 
sites I found in the {{DDLTask}}, {{MoveTask}}, and {{Hive}} classes.  If so, 
then the presence of ACLs in the source file implies that ACLs will be 
supported when you make the setfacl call on the destination path.  (ACLs are 
enabled or disabled for the whole HDFS namespace.)  That would mean it's 
feasible to rely on checking {{sourceStatus.getPermission().getAclBit()}} and 
remove all calls to {{isExtendedAclEnabled}}, which relies on inspecting the 
configuration.

Even if you want to continue relying on the configuration, you can still check 
the ACL bit on the source before trying the {{getAclStatus}} call, which is an 
RPC.

If you decide to go ahead and remove this dependency on 
{{dfs.namenode.acls.enabled}} in the configuration, then there are also some 
log messages which mention the configuration property that could be updated.

Thanks for the patch, Abdelrahman!

 avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
 -

 Key: HIVE-9182
 URL: https://issues.apache.org/jira/browse/HIVE-9182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Abdelrahman Shettia
 Fix For: 1.2.0

 Attachments: HIVE-9182.2.patch, HIVE-9182.3.patch


 File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl 
 functionality.
 Hadoop23Shims has code that calls getAclStatus on file systems.
 Instead of calling getAclStatus and catching the exception, we can also check 
 FsPermission#getAclBit .
 Additionally, instead of catching all exceptions for calls to getAclStatus 
 and ignoring them, it is better to just catch UnsupportedOperationException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-08 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486193#comment-14486193
 ] 

Chris Nauroth commented on HIVE-9736:
-

Hi [~mithun].  Great ideas in this patch!  I'd be happy to help code review 
(non-binding) on a rebased version of the patch.  I'll watch for it.  Thanks!

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10223) Consolidate several redundant FileSystem API calls.

2015-04-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483649#comment-14483649
 ] 

Chris Nauroth commented on HIVE-10223:
--

[~hagleitn], thanks for confirming the tests and doing the commit!

 Consolidate several redundant FileSystem API calls.
 ---

 Key: HIVE-10223
 URL: https://issues.apache.org/jira/browse/HIVE-10223
 Project: Hive
  Issue Type: Improvement
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 1.2.0

 Attachments: HIVE-10223.1.patch


 This issue proposes to consolidate several Hive calls to the Hadoop Common 
 {{FileSystem}} API into a fewer number of calls that still accomplish the 
 equivalent work.  {{FileSystem}} API calls typically translate into RPCs to 
 other services like the HDFS NameNode or alternative file system 
 implementations.  Consolidating RPCs will lower latency a bit for Hive code 
 and reduce some load on these external services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9182) avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl

2015-03-02 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344206#comment-14344206
 ] 

Chris Nauroth commented on HIVE-9182:
-

Hi [~ashettia] and [~thejas].  Do you think {{setFullFileStatus}} needs to be 
changed too?

 avoid FileSystem.getAclStatus rpc call for filesystems that don't support acl
 -

 Key: HIVE-9182
 URL: https://issues.apache.org/jira/browse/HIVE-9182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Abdelrahman Shettia
 Fix For: 1.2.0

 Attachments: HIVE-9182.1.patch


 File systems such as s3, wasp (azure) don't implement Hadoop FileSystem acl 
 functionality.
 Hadoop23Shims has code that calls getAclStatus on file systems.
 Instead of calling getAclStatus and catching the exception, we can also check 
 FsPermission#getAclBit .
 Additionally, instead of catching all exceptions for calls to getAclStatus 
 and ignoring them, it is better to just catch UnsupportedOperationException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)