[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13213 Change subject: Bump CDP_BUILD_NUMBER to 1056671 .. Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/2 -- To view, visit http://gerrit.cloudera.org:8080/13213 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Gerrit-Change-Number: 13213 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. This will be taken up in subsequent change. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 3. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 4. The Sentry object ownership tests are flaky. There are race-conditions in that code which get exposed for some by this patch. For example, when a database is created, the object privileges are immediately updated in the catalog cache. If Sentry has not been updated and there is a refresh authorization call in between it can clear the privilege from the cache causing the subsequent statement using that privilege to fail. The patch adds sleep statements in the test to work-around these issues. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShims.java R fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShim.java A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 34 files changed, 1,870 insertions(+), 449 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/12 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1073 PS11, Line 1073: .getAddPartitionMessage(event.getMessage()); > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1169 PS11, Line 1169: try { > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/13005/11/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/13005/11/testdata/bin/run-hive-server.sh@66 PS11, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,\ > line too long (121 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Apr 2019 18:03:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7973: Add support for fine grained events processing for partition level HMS events.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13111 ) Change subject: IMPALA-7973: Add support for fine grained events processing for partition level HMS events. .. Patch Set 3: (5 comments) Patch looks good to me. some nits below and its good to go from my side. http://gerrit.cloudera.org:8080/#/c/13111/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13111/3//COMMIT_MSG@17 PS3, Line 17: affetced spell-check http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1081 PS3, Line 1081: org.apache.hadoop.hive.common.FileUtils Can we import this instead of using fully qualified classname? http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1153 PS3, Line 1153: infoLog no need to have add_partitions in the log message since infoLog provides that information http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1307 PS3, Line 1307: {} partitions dropped from table {} in alter_partition event This statement doesn't make sense to me. Why would there be dropped partitions in alter_partition event? Also, the log doesn't need to have a event type in it since infoLog method takes care of it. http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1315 PS3, Line 1315: debugLog no need to say "after a drop_partition event" here since the debugLog prints both the event id and event type -- To view, visit http://gerrit.cloudera.org:8080/13111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5 Gerrit-Change-Number: 13111 Gerrit-PatchSet: 3 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 29 Apr 2019 17:31:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8419 : Validate event processing related configurations
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13019 ) Change subject: IMPALA-8419 : Validate event processing related configurations .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606 Gerrit-Change-Number: 13019 Gerrit-PatchSet: 13 Gerrit-Owner: Bharath Krishna Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 29 Apr 2019 18:49:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7973: Add support for fine grained events processing for partition level HMS events.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13111 ) Change subject: IMPALA-7973: Add support for fine grained events processing for partition level HMS events. .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/13111/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13111/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@56 PS4, Line 56: ClassUtil remove if unused -- To view, visit http://gerrit.cloudera.org:8080/13111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5 Gerrit-Change-Number: 13111 Gerrit-PatchSet: 4 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 29 Apr 2019 18:52:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 12: (2 comments) Updated the patch with Joe's review comments. http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh@a1 PS12, Line 1: > Restore this line (RAT checks will fail) Done http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh@513 PS12, Line 513: # The directory in which all the thirdparty CDP components live. : export CDP_COMPONENTS_HOME="$IMPALA_TOOLCHAIN/cdp_components-$CDP_BUILD_NUMBER" > You move this up, so you can remove this. Done -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Apr 2019 20:30:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShim.java A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 34 files changed, 1,885 insertions(+), 465 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/14 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 14 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 3: (15 comments) > (1 comment) > > Nice work. > > When I did the transitional work between Hive 1 and Hive 2, I > introduced a variable that switched at build time between the two > worlds. (See a203733fac3e1e37df8abeee39a88d187153a8c5 for the > revert and "git log --grep IMPALA_MINICLUSTER_PROFILE") > > If I'm understanding right, the approach here is to produce a > single "binary" that works for both worlds? Or at run time do the > "original" Hive jars get run? I think both approaches are > plausible; just want to make sure we're clear about it. > > (Is the shading slow? I've seen maven-shade-plugin be very slow...) The original approach was to always build against hive-3, but we switched the approach to use a compatibility shim based approach as you suggested above. Shading is not super slow. It task about 4 sec to build the jar if I remember correctly http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@199 PS2, Line 199: # When USE_CDP_HIVE is set we use the latest hive version available to deply in minicluster > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@203 PS2, Line 203: # TODO(Vihang) we should repackage the tarballs so that the src and binaries are extracted > line too long (92 > 90) This line was removed later in the patch http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@212 PS2, Line 212: export HIVE_HOME="$IMPALA_TOOLCHAIN/cdh_components-${CDH_BUILD_NUMBER}/hive-${MINICLUSTER_HIVE_VERSION}" > line too long (106 > 90) Done http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@230 PS2, Line 230: . "$IMPALA_HOME/bin/impala-config-branch.sh" : if [ -f "$IMPALA_HOME/bin/impala-config-local.sh" ]; then : . "$IMPALA_HOME/bin/impala-config-local.sh" : fi > We need to be careful about which variables are assigned before this and wh Good point. Moved all the new variable assignment logic post this block. http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@546 PS2, Line 546: export HIVE_METASTORE_THRIFT_DIR=$CDP_COMPONENTS_HOME/apache-hive-${IMPALA_HIVE_VERSION}-bin/src/standalone-metastore/src/main/thrift > line too long (133 > 90) This line was moved later in the patch and the line length is under the 90 now. http://gerrit.cloudera.org:8080/#/c/13005/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/13005/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@105 PS2, Line 105: public static String unescapeSQLString(String b) { > Do we have any plans to create a public classes in Hive through which we ca Yes, I think we may be able to do that add that. Will do it as a followup item. http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@96 PS3, Line 96: /** :* Copied from Apache Hive's BaseSemanticAnalyzer. This method has not changed :* since last several years so hoping that it is fairly stable by now. Sourcing it from :* the Hive's code without copying brings along with it a lot of other unnecessary :* dependencies :* @param b :* @return :*/ > Can you move the parts copied from Hive to a separate file/directory? It wi The copied code was moved into HiveShims class in the compat-3 directory http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java: http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@122 PS3, Line 122: //TODO (Vihang) this pulls in hive-hbase-handler dependency which brings all the : // other stuff we don't need. Perhaps we just need to copy the constants and mark : // them public API in Hive source code : private static final String HBASE_COLUMNS_MAPPING = "hbase.columns.mapping"; : private static final String HBASE_TABLE_DEFAULT_STORAGE_TYPE = "hbase.table.default" : + ".storage.type"; : private static final String HBASE_KEY_COL = ":key"; : private static final String
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 12: (11 comments) Addressed review comments http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG@58 PS9, Line 58: > Is this flakiness specific to the hive 3 config? Or the races were already I checked and confirmed that the notification listener is working is generating the events as expected but there may be more to it. I will dig more. Even if there is a some problem with the event data, these races are still present can show up when Sentry is slow for any reason to update its entries. http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG@65 PS9, Line 65: > maybe we shoudl disable these tests when running with hive 3 since we don't So far, I would expect these tests work without any modifications on hive-2 builds (results pending for the last job I triggered). When we turn on cdp for jobs we should re-investigate how to fix these tests. http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@167 PS3, Line 167: export IMPALA_RANGER_VERSION=1.2.0.6.0.99.0-45 > This number looks like a CDH_BUILD_NUMBER, and is probably from the same na I was not aware of this.. Most of the pending patches are merged into official builds so we may not need this anymore. http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@199 PS3, Line 199: # TODO(todd) switch to an official build. > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@203 PS3, Line 203: # CDH hive version is used to build and deploy in minicluster when USE_CDP_HIVE is > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@212 PS3, Line 212: fi > line too long (106 > 90) Done http://gerrit.cloudera.org:8080/#/c/13005/9/bin/set-classpath.sh File bin/set-classpath.sh: http://gerrit.cloudera.org:8080/#/c/13005/9/bin/set-classpath.sh@30 PS9, Line 30: #"$IMPALA_HOME"/shaded-deps/target/impala-shaded-deps-0.1-SNAPSHOT.jar:\ > why is this necessary? shouldn't the shaded-deps dependency also end up in You are right, this is not necessary. removed it. http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java File fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java: http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java@68 PS9, Line 68: throws MetaException { : return null; : } : }; : : /** > Are these changes actually used right now? I think this stuff ended up bein Yeah. reverted the changes to this file in the latest patch http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@173 PS9, Line 173: private static MetastoreEventsProcessor instance; > Can you explain what's going on with this part of the change? Did Sentry mo Sentry did not move to the JSONMessageFactory (and I believe this may be the reason why the test_ownership.py fails). However, in the shims approach we use the ExtendedJsonFactory for hive-2 builds and in hive-3 build we don't expect Sentry to be there. I will keep investigating while I debug cdp jobs. http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh@66 PS9, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,\ > probably remove this Done http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh@93 PS9, Line 93: if [ ${ONLY_METASTORE} -eq 0 ]; then > this is in another patch- guess we can rebase this on top of that one to pi Rebased my patch. -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 31 files changed, 1,775 insertions(+), 455 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/16 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 16 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 31 files changed, 1,777 insertions(+), 457 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/18 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 18 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 31 files changed, 1,777 insertions(+), 457 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/19 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 19 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 31 files changed, 1,791 insertions(+), 456 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/20 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 20 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 31 files changed, 1,813 insertions(+), 464 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/21 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 21 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer:
[Impala-ASF-CR] IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13251 ) Change subject: IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13251/1/testdata/bin/load-dependent-tables.sql File testdata/bin/load-dependent-tables.sql: http://gerrit.cloudera.org:8080/#/c/13251/1/testdata/bin/load-dependent-tables.sql@a115 PS1, Line 115: Some of the test rely on the fact that this table exists. Perhaps we should also ignore/modify such tests if we are running against hive-3. Running git grep "hive_index_tbl" shows that this is used in CatalogObjectToFromThriftTest, CatalogTest and FrontendTest -- To view, visit http://gerrit.cloudera.org:8080/13251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd Gerrit-Change-Number: 13251 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 07 May 2019 00:34:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13254 Change subject: IMPALA-8369 : Fix for tests failing with incompatible column changes .. IMPALA-8369 : Fix for tests failing with incompatible column changes In Hive-3 the configuration for allowing users to make incompatible column type changes was disabled by default. In Hive-2 this was allowed. Some of the tests like data_errors/test_data_errors.py and metadata/test_compute_stats.py make changes to column types which are disallowed by HMS-3 by default. This change adds a configuration option in hive-site.xml to allow making incompatible changes to column types so that we can run the existing tests with HMS-3. Also, in HMS-3 there are certain new event types (OPEN_TXN, COMMIT_TXN, etc) which may not have dbname set. This breaks the assumption in the code in EventProcessor which expects dbName_ to be not null at all times. This patch also makes changes in the EventProcessor so that such Ignored events do not fail precondition checks during event processing. Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/resources/hive-site.xml.py 2 files changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/13254/2 -- To view, visit http://gerrit.cloudera.org:8080/13254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6 Gerrit-Change-Number: 13254 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] Hive 3: fix test permanent udfs.py for Hive 3 support
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13236 ) Change subject: Hive 3: fix test_permanent_udfs.py for Hive 3 support .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f50845c7d4769d8843cad87988498e165902169 Gerrit-Change-Number: 13236 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Mon, 06 May 2019 19:55:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13254 ) Change subject: IMPALA-8369 : Fix for tests failing with incompatible column changes .. IMPALA-8369 : Fix for tests failing with incompatible column changes In Hive-3 the configuration for allowing users to make incompatible column type changes was disabled by default. In Hive-2 this was allowed. Some of the tests like data_errors/test_data_errors.py and metadata/test_compute_stats.py make changes to column types which are disallowed by HMS-3 by default. This change adds a configuration option in hive-site.xml to allow making incompatible changes to column types so that we can run the existing tests with HMS-3. Also, in HMS-3 there are certain new event types (OPEN_TXN, COMMIT_TXN, etc) which may not have dbname set. This breaks the assumption in the code in EventProcessor which expects dbName_ to be not null at all times. This patch also makes changes in the EventProcessor so that such Ignored events do not fail precondition checks during event processing. Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6 --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/resources/hive-site.xml.py 2 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/13254/3 -- To view, visit http://gerrit.cloudera.org:8080/13254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6 Gerrit-Change-Number: 13254 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13254 ) Change subject: IMPALA-8369 : Fix for tests failing with incompatible column changes .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13254/2/fe/src/test/resources/hive-site.xml.py File fe/src/test/resources/hive-site.xml.py: http://gerrit.cloudera.org:8080/#/c/13254/2/fe/src/test/resources/hive-site.xml.py@85 PS2, Line 85: > flake8: E501 line too long (92 > 90 characters) Done -- To view, visit http://gerrit.cloudera.org:8080/13254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6 Gerrit-Change-Number: 13254 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 07 May 2019 18:06:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 [Test fixes] More test fixes when running against Hive-3
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13276 Change subject: IMPALA-8369 [Test fixes] More test fixes when running against Hive-3 .. IMPALA-8369 [Test fixes] More test fixes when running against Hive-3 This test fixes CatalogTest, FrontendTest, CatalogObjectTofromThriftTest by breaking some tests into the ones which are not expected to work on Hive-3 and then skipping it. It does this by adding a util method in TestUtils which returns if the environment variable IMPALA_HIVE_MAJOR_VERSION is >= 3. If this condition is true, it skips certain tests which use hive_idx_tbl (not supported in data-load against Hive-3). If it is less than 3 the tests are not skipped so we keep the test coverage on Hive-2 setups. Also, fixes the TestCaseLoaderTest which instantiates a embedded HMS instance. This requires some configuration changes for the embedded standalone mode as well as adding datanucleus JDO as a test dependency. Additionally, this patch also fixes test_show_create_table which was failing on Hive-3 setups due to the additional parameter bucketing_version available from Hive-3. Testing Done: 1. Ran the tests when mini-cluster is deployed with USE_CDP_HIVE=true and made sure that the tests work (or are skipped as expected) 2. Ran the same tests with USE_CDP_HIVE=false to make sure they still work against HMS-2 Change-Id: If05f74efc481e2b0d26a9c4f6e58cef38605d72c --- M fe/pom.xml M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M tests/metadata/test_show_create_table.py 7 files changed, 69 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/13276/1 -- To view, visit http://gerrit.cloudera.org:8080/13276 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If05f74efc481e2b0d26a9c4f6e58cef38605d72c Gerrit-Change-Number: 13276 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8419 : Fetch metastore configuration values to detect misconfigured setups
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13019 ) Change subject: IMPALA-8419 : Fetch metastore configuration values to detect misconfigured setups .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@222 PS4, Line 222: FIRE_EVENTS_FOR_DML We probably also need to make sure that the parameters filter config does not filter out keys like impala.events.catalogVersion, impala.events.catalogServiceId and impala.disableHmsSync http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@315 PS4, Line 315: try { > Do you mean to use try-with-resource in the above method which calls using I see. May be you can then create another method called getMetastoreConfig(String key, String defaultval) here. This method can implement it using try-with-resource as I suggested above. In the test, you can then use Mockito.spy to return the dummy values when this method is called. Something like MetastoreStoreEventsProcessor spyEventsProcessor = Mockito.spy(eventsProcessor_); doReturn("testValue").when(spyProcessor.getMetastoreConfig()); http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@123 PS4, Line 123: public static String getMetastoreConfigValue( > I feel this is useful too when users can just compare the return value with A more common pattern (and generic too) is to let the consumers decide what should be the default value if the configuration is not present. For example, for a boolean configuration users can choose to do getConfig(key, "false) whereas for a String configuration they can do getConfig(key, ""); If you avoid hard-coding the default value assumption it make it more usable for all the consumers. http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@245 PS4, Line 245: toggleBooleanValueString code assumes that config values will always be booleans. Its probably easier to just pass some dummy. Also since there are only a few configurations, it probably easier to just have multiple statements of when(getConfig(key1)).thenReturn(val1); when(getConfig(key1)).thenReturn(val1); when(getConfig(key1)).thenReturn(val1); and get rid of the loop this way there is no need to have the cleanup logic too. All this can be done by creating a simple util method which takes in validateConfig(key, mockValue, shouldSucceed). You can test both the positive and negative cases using this util method. -- To view, visit http://gerrit.cloudera.org:8080/13019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606 Gerrit-Change-Number: 13019 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Krishna Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 18 Apr 2019 21:48:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded hive source component already exists and it downloads again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 16 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/6 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded hive source component already exists and it downloads again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 16 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/5 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@121 PS4, Line 121: component package directory" > You can do something like this Ah, I see. Done. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 03 May 2019 20:58:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/13213 ) Change subject: Bump CDP_BUILD_NUMBER to 1056671 .. Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Notes: 1. The tarball names and the directory to which they extract to changed from the previous CDP_BUILD_NUMBER. Due to this we need to change the bootstrap_toolchain and impala-config.sh so that the Hive environment variables are set correctly. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 7 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/6 -- To view, visit http://gerrit.cloudera.org:8080/13213 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Gerrit-Change-Number: 13213 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded components already contains a extracted tar ball of hive source. Due to this every time the script executes it downloads the hive source tar again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/2 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya
[Impala-ASF-CR] IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13199 ) Change subject: IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync .. Patch Set 3: Code-Review+1 Thanks for the changes. Looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/13199 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1016c27d3f12cef71a09a895ab42fd15a54aeee1 Gerrit-Change-Number: 13199 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 02 May 2019 19:29:43 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13219 Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded components already contains a extracted tar ball of hive source. Due to this every time the script is executes it downloads the hive source tar again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 10 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/1 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded components already contains a extracted tar ball of hive source. Due to this every time the script executes it downloads the hive source tar again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 11 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/3 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Hive 3: switch to Tez-on-YARN execution
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13224 ) Change subject: Hive 3: switch to Tez-on-YARN execution .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/13224/1/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/13224/1/bin/bootstrap_toolchain.py@571 PS1, Line 571: use_cdp_hive looks like we can get rid of this line since it is already initialized above. http://gerrit.cloudera.org:8080/#/c/13224/1/testdata/cluster/admin File testdata/cluster/admin: http://gerrit.cloudera.org:8080/#/c/13224/1/testdata/cluster/admin@311 PS1, Line 311: for would be great if you could add a comment here explain why this is being done. -- To view, visit http://gerrit.cloudera.org:8080/13224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d Gerrit-Change-Number: 13224 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Fri, 03 May 2019 18:01:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded components already contains a extracted tar ball of hive source. Due to this every time the script executes it downloads the hive source tar again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a --- M bin/bootstrap_toolchain.py 1 file changed, 18 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/4 -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] Hive 3: switch to Tez-on-YARN execution
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13224 ) Change subject: Hive 3: switch to Tez-on-YARN execution .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/13224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d Gerrit-Change-Number: 13224 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Fri, 03 May 2019 18:29:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13199 ) Change subject: IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml File docs/topics/impala_metadata.xml: http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml@169 PS2, Line 169: : Change the description of the database : : : : Change the comment on the database : comment and description are one and the same. I think you are missing changing the owner of the database here http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml@178 PS2, Line 178: Change I think it is worth mentioning here that changing the default location of the database does not move the tables of that database to the new location. Only the new tables which are created subsequently use the default location of the database in case it is not provided in the create table statement. -- To view, visit http://gerrit.cloudera.org:8080/13199 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1016c27d3f12cef71a09a895ab42fd15a54aeee1 Gerrit-Change-Number: 13199 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 02 May 2019 01:52:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13213 ) Change subject: Bump CDP_BUILD_NUMBER to 1056671 .. Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Notes: 1. The tarball names and the directory to which they extract to changed from the previous CDP_BUILD_NUMBER. Due to this we need to change the bootstrap_toolchain and impala-config.sh so that the Hive environment variables are set correctly. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 7 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/4 -- To view, visit http://gerrit.cloudera.org:8080/13213 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Gerrit-Change-Number: 13213 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13213 ) Change subject: Bump CDP_BUILD_NUMBER to 1056671 .. Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 7 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/3 -- To view, visit http://gerrit.cloudera.org:8080/13213 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Gerrit-Change-Number: 13213 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. This will be taken up in subsequent change. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 3. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 4. The Sentry object ownership tests are flaky. There are race-conditions in that code which get exposed for some by this patch. For example, when a database is created, the object privileges are immediately updated in the catalog cache. If Sentry has not been updated and there is a refresh authorization call in between it can clear the privilege from the cache causing the subsequent statement using that privilege to fail. The patch adds sleep statements in the test to work-around these issues. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A
[Impala-ASF-CR] Fix redundant downloads of hive source tarball
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13219 ) Change subject: Fix redundant downloads of hive source tarball .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/13219/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13219/4//COMMIT_MSG@13 PS4, Line 13: Due to : this > nit: repeated use of "due to this", we can just use "and" Done http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@121 PS4, Line 121: component package directory" > nit: fix indentation Not sure how to fix this without adding more spaces in the exception message. http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@122 PS4, Line 122: if pkg_directory is not None: : self.pkg_directory = "{0}/{1}".format(cdp_components_home, pkg_directory) : else: : self.pkg_directory = "{0}/{1}".format(cdp_components_home, basename) > It's more Pythonic to do this: Done -- To view, visit http://gerrit.cloudera.org:8080/13219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Gerrit-Change-Number: 13219 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 03 May 2019 20:37:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8438: Store WriteId and valid writeId list for table and partitio
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13215 ) Change subject: IMPALA-8438: Store WriteId and valid writeId list for table and partitio .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@7 PS3, Line 7: partitio nit, spell-check http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@10 PS3, Line 10: fucntions nit, spell-check http://gerrit.cloudera.org:8080/#/c/13215/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/13215/3/common/thrift/CatalogObjects.thrift@483 PS3, Line 483: 4: optional string valid_write_ids Instead of storing this in string format, do you think it would be better to parse the string and store these as individual fields? If we do that, can we get rid of storing the table_name since it is already available in THdfsTable? http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@237 PS3, Line 237: getHighestWriteId Do you think its better to throw UnsupportedOperationException in these methods? http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@258 PS3, Line 258: public static long getMajorVersion() { > can we rename this to getMajorVersion or something since it's not the full +1 -- To view, visit http://gerrit.cloudera.org:8080/13215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6edbd64424edf0ba88af110ab8b958a1966b8b54 Gerrit-Change-Number: 13215 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Fri, 03 May 2019 20:43:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369: Fixing some core tests in Hive environment
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13283 ) Change subject: IMPALA-8369: Fixing some core tests in Hive environment .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/impala_test_suite.py@738 PS3, Line 738: get_partition_names is this a change of behavior in hive client in version 3? If yes, is it breaking backwards compatibility? http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/skip.py File tests/common/skip.py: http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/skip.py@216 PS3, Line 216: == should this be >= 3 -- To view, visit http://gerrit.cloudera.org:8080/13283 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45d9b9312c6c77f436ab020ae68c15f3c7c737de Gerrit-Change-Number: 13283 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 08 May 2019 19:25:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories The FileMetadataLoader is used to load the file information in when the table is loaded. By default, it lists all the files in the table/partition directory. Currently, it only skips the filenames which are invalid (hidden files and ones starting with "_" etc). However, it does not skip the directories which are temporary or hidden. In case of Hive when data is inserted into a table, it creates a temporary staging directory which is a hidden directory under the table location. When the insert in hive is completed, such staging directories are removed. But if there is a refresh called during that time, FileMetadataLoader will add the files in the staging directory as well. Not only this could cause temporary invalid results but it causes table to go in a bad state when these temporary directories are removed. The only work-around in such a case to issue a refresh on the table again. This patch adds logic in the filemetadataloader to ignore such temporary staging directories. Unfortunately, hadoop does not provide a API which can recursively list files in a directory and skip certain directories. This patch adds a new FilterIterator which wraps around existing listFiles, listStatus and RecursingIterator to skip the hidden directories from the listing result. Also, the existing code to recover partitions implements its own recursion logic which includes path validation. This already skips such hidden directories since they do not conform to the partition spec. The patch does a minor modification to this method by directly calling the listStatusIterator instead of going through FileSystemUtil#listStatus whiche uses the filtering remote iterator now. Testing: 1. Added a new tests as well as modified existing ones which were related to cover interesting cases. 2. Ran concurrent inserts from Hive while issuing refresh in a loop on Impala side. Earlier this would cause the table to go into a bad state. Now, it works fine for the staging directories. It still runs into a FileNotFoundException from the impalad when there are insert overwrite statements in Hive Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Reviewed-on: http://gerrit.cloudera.org:8080/13665 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java A fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M tests/metadata/test_recursive_listing.py 6 files changed, 236 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 14 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. Patch Set 13: Code-Review+2 Carrying forward Todd's +2 from earlier -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 13 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 09 Jul 2019 15:31:25 + Gerrit-HasComments: No
[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1235229
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13799 ) Change subject: Bump CDP_BUILD_NUMBER to 1235229 .. Bump CDP_BUILD_NUMBER to 1235229 The newer CDP build includes fix for HIVE-21932 which is required to fix test failures on CDP builds. Also, this patch fixes two test failures on CDP. 1 Fixes the MetastoreEventsProcessorTest on CDP caused due to the serialization difference of the event messages between CDH and CDP builds. 2. Fixes the Ranger audit test failure caused due to the missing cluster name paramter from the Audit events from Ranger. This is needed a change in the ranger-hive-audit.xml due to RANGER-2458 Testing: 1. Build against CDP and CDH builds 2. Run the previously failing tests on CDP which were caused by HIVE-21932 3. Run full tests with CDP and CDH builds Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 --- M bin/impala-config.sh M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationTestBase.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/resources/ranger-hive-audit.xml 6 files changed, 37 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/3 -- To view, visit http://gerrit.cloudera.org:8080/13799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 Gerrit-Change-Number: 13799 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13777 ) Change subject: IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36cfc7e592ed2588a8c1f8375033d60492b27a4f Gerrit-Change-Number: 13777 Gerrit-PatchSet: 6 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 03 Jul 2019 18:39:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13777 ) Change subject: IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml File docs/topics/impala_metadata.xml: http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@137 PS5, Line 137: table In this case, it actually refreshes the partition not the table http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@216 PS5, Line 216: operation Using APIs instead of operation is better in my opinion. http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@218 PS5, Line 218: /tmp/spark_unpart I think it would be easier to understand if the example provides a realistic table location like /user/hive/warehouse/spark_etl.db/customers/date=01012019 http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@258 PS5, Line 258: hive.metastore.dml.events We should modify this text to say that this configuration should also be set to true in the hive-site.xml used by the spark applications (typically /etc/hive/conf/hive-site.xml) so that INSERT events are generated when the spark application inserts data into existing tables and partitions. -- To view, visit http://gerrit.cloudera.org:8080/13777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36cfc7e592ed2588a8c1f8375033d60492b27a4f Gerrit-Change-Number: 13777 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 03 Jul 2019 17:36:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] Bump CDP BUILD NUMBER to 1235229
Vihang Karajgaonkar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13799 ) Change subject: [WIP] Bump CDP_BUILD_NUMBER to 1235229 .. [WIP] Bump CDP_BUILD_NUMBER to 1235229 The newer CDP build includes fix for HIVE-21932 which is required to fix test failures on CDP builds. Also, fixed the MetastoreEventsProcessorTest on CDP caused due to the serialization difference of the event messages between CDH and CDP builds. [WIP] Testing: 1. Build against CDP and CDH builds 2. Run the previously failing tests on CDP which were caused by HIVE-21932 3. Run full tests with CDP and CDH builds Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 --- M bin/impala-config.sh M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 4 files changed, 34 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/2 -- To view, visit http://gerrit.cloudera.org:8080/13799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 Gerrit-Change-Number: 13799 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545 PS8, Line 545: all underlying files (except which are :* in the ignored directories) > hrm, were the semantics already broken then? See the unit test AcidUtilsTes yeah, it looks like its broken currently for this case. I will file a JIRA. The AcidUtils test works because it works off the List not the ones which are actually returned by the FileMetadataLoader. Specifically, if the table is transactional we cannot rely on using listFiles API since it will skip empty directories (Or we should implement a version of it for own purposes). Not sure how to handle the S3 recursive listFiles case though. Created https://issues.apache.org/jira/browse/IMPALA-8739 as suggested -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 03 Jul 2019 20:45:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [WIP] Bump CDP BUILD NUMBER to 1235229
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13799 Change subject: [WIP] Bump CDP_BUILD_NUMBER to 1235229 .. [WIP] Bump CDP_BUILD_NUMBER to 1235229 The newer CDP build includes fix for HIVE-21932 which is required to fix test failures on CDP builds. Also, fixed the MetastoreEventsProcessorTest on CDP caused due to the serialization difference of the event messages between CDH and CDP builds. [WIP] Testing: 1. Build against CDP and CDH builds 2. Run the previously failing tests on CDP which were caused by HIVE-21932 3. Run full tests with CDP and CDH builds Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 --- M bin/impala-config.sh M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java 4 files changed, 33 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/1 -- To view, visit http://gerrit.cloudera.org:8080/13799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595 Gerrit-Change-Number: 13799 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8734: Reload table schema on TBLPROPERTIES change
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13785 ) Change subject: IMPALA-8734: Reload table schema on TBLPROPERTIES change .. Patch Set 4: Code-Review+2 patch looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/13785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a43a962c2a456f3ddc078b2924f551fccb5c2ad Gerrit-Change-Number: 13785 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 02 Jul 2019 20:11:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/13665 ) Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories .. IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories The FileMetadataLoader is used to load the file information in when the table is loaded. By default, it lists all the files in the table/partition directory. Currently, it only skips the filenames which are invalid (hidden files and ones starting with "_" etc). However, it does not skip the directories which are temporary or hidden. In case of Hive when data is inserted into a table, it creates a temporary staging directory which is a hidden directory under the table location. When the insert in hive is completed, such staging directories are removed. But if there is a refresh called during that time, FileMetadataLoader will add the files in the staging directory as well. Not only this could cause temporary invalid results but it causes table to go in a bad state when these temporary directories are removed. The only work-around in such a case to issue a refresh on the table again. This patch adds logic in the filemetadataloader to ignore such temporary staging directories. Unfortunately, hadoop does not provide a API which can recursively list files in a directory and skip certain directories. This patch adds a new FilterIterator which wraps around existing listFiles, listStatus and RecursingIterator to skip the hidden directories from the listing result. Also, the existing code to recover partitions implements its own recursion logic which includes path validation. This already skips such hidden directories since they do not conform to the partition spec. The patch does a minor modification to this method by directly calling the listStatusIterator instead of going through FileSystemUtil#listStatus whiche uses the filtering remote iterator now. Testing: 1. Added a new tests as well as modified existing ones which were related to cover interesting cases. 2. Ran concurrent inserts from Hive while issuing refresh in a loop on Impala side. Earlier this would cause the table to go into a bad state. Now, it works fine for the staging directories. It still runs into a FileNotFoundException from the impalad when there are insert overwrite statements in Hive Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java A fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M tests/metadata/test_recursive_listing.py 6 files changed, 236 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/13665/12 -- To view, visit http://gerrit.cloudera.org:8080/13665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa Gerrit-Change-Number: 13665 Gerrit-PatchSet: 12 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7973: Add support for fine grained updates at partition level.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13111 ) Change subject: IMPALA-7973: Add support for fine grained updates at partition level. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1106 PS2, Line 1106: addedPartitions_ Would be good if you can log a useful information about the event like number of partitions, isReplace, tablename etc. http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1107 PS2, Line 1107: Map partSpec = new HashMap<>(); : List fsList = : msTbl_.getPartitionKeys(); : List partVals = partition.getValues(); : Preconditions.checkNotNull(partVals); : Preconditions.checkState(fsList.size() == partVals.size()); : for (int i = 0; i < fsList.size(); i++) { : partSpec.put(fsList.get(i).getName(), partVals.get(i)); : } We can avoid one unnecessary conversion from Partition -> partSpecMap -> TPartitionKeyValue if we change the signature of the refreshPartitionIfExists to take List here. http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1194 PS2, Line 1194: Preconditions.checkNotNull(partitionAfter_); : Map partSpec = new HashMap<>(); : List fsList = : msTbl_.getPartitionKeys(); : List partVals = partitionAfter_.getValues(); : Preconditions.checkNotNull(partVals); : Preconditions.checkState(fsList.size() == partVals.size()); : for (int i = 0; i < fsList.size(); i++) { : partSpec.put(fsList.get(i).getName(), partVals.get(i)); : } same suggestion as above. Also, instead of duplicating this logic, create a util method in the base class (or a static method) http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1266 PS2, Line 1266: droppedPartitions_ add a null check as well -- To view, visit http://gerrit.cloudera.org:8080/13111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5 Gerrit-Change-Number: 13111 Gerrit-PatchSet: 2 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 25 Apr 2019 02:14:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/authorization/test_owner_privileges.py M tests/common/sentry_cache_test_suite.py M tests/custom_cluster/test_permanent_udfs.py 34 files changed, 1,208 insertions(+), 294 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/7 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. This will be taken up in subsequent change. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 3. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 4. The Sentry object ownership tests are flaky. There are race-conditions in that code which get exposed for some by this patch. For example, when a database is created, the object privileges are immediately updated in the catalog cache. If Sentry has not been updated and there is a refresh authorization call in between it can clear the privilege from the cache causing the subsequent statement using that privilege to fail. The patch adds sleep statements in the test to work-around these issues. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A
[Impala-ASF-CR] IMPALA-8419 : Validate event processing related configurations
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13019 ) Change subject: IMPALA-8419 : Validate event processing related configurations .. Patch Set 7: (12 comments) Overall looks good. Few comments below and then good to go from my side. http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java File fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java: http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@42 PS7, Line 42: EventProcessorValidation a better name could be EventProcessorConfigValidator http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@42 PS7, Line 42: interface general convention in Impala is not to use package-private classes (although I think its a good idea to have them) For consistency, please change these to public. Same with the members of this class. Try to keep the members private unless its not possible. http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@59 PS7, Line 59: getExpectedValue method doc http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@74 PS7, Line 74: String private static final http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@115 PS7, Line 115: MetastoreEventParameters nit, formatting is off http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@120 PS7, Line 120: filtered s/filtered/filtered out/ http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@157 PS7, Line 157: Found would be good to provided the expected value here as well http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@68 PS7, Line 68: MetastoreEventParameters using parameters and properties interchangebly is confusing. may be rename this to MetastoreEventPropertyKey http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@70 PS7, Line 70: CATALOG_VERSION_PROP_KEY _PROP_KEY is redundant if you name the enum as MetastoreEventPropertyKey. SAme for others. http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@253 PS7, Line 253: Runtime error.. redundant to have this text here. can be removed. http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@88 PS7, Line 88: DEFAULT_METASTORE_CONFIG_VALUE I am not convinced of the value of having this. Why can't we just use empty string which could be private to the config validator? http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@336 PS7, Line 336: impala add a test case for ^impala as well since it matches the disableHmsSync paramter -- To view, visit http://gerrit.cloudera.org:8080/13019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606 Gerrit-Change-Number: 13019 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Krishna Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Krishna Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 25 Apr 2019 20:23:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 8: > (3 comments) > > Before more work goes into this, this is changing the Hive major > version number and there should be a discussion on dev@ about this > approach and how it fits into Impala releases. Just sent a note the dev list. Thanks for the suggestion. -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 25 Apr 2019 22:02:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/dependency-reduced-pom.xml A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/authorization/test_owner_privileges.py M tests/common/sentry_cache_test_suite.py M tests/custom_cluster/test_permanent_udfs.py 35 files changed, 1,275 insertions(+), 292 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/6 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang
[Impala-ASF-CR] Configure Hive 3's HS2 to execute queries using Tez local mode
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12931 ) Change subject: Configure Hive 3's HS2 to execute queries using Tez local mode .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12931 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76e47fbd1d6ff5103d81a8de430d5465dba284cd Gerrit-Change-Number: 12931 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 25 Apr 2019 17:34:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/dependency-reduced-pom.xml A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 33 files changed, 1,224 insertions(+), 258 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/5 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] [acid] Disallow any operation on full acid table.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13047 ) Change subject: [acid] Disallow any operation on full acid table. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@80 PS2, Line 80: transactional Can we change these strings to constants? transactional, transactional_properties, insert_only http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@84 PS2, Line 84: tableIsTransactional != null && tableIsTransactional.equalsIgnoreCase("true") nit, probably cleaner if we replace this by Boolean.parseBoolean(tableIsTransactional) -- To view, visit http://gerrit.cloudera.org:8080/13047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac Gerrit-Change-Number: 13047 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 23 Apr 2019 19:42:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. This will be taken up in subsequent change. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 3. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 4. The Sentry object ownership tests are flaky. There are race-conditions in that code which get exposed for some by this patch. For example, when a database is created, the object privileges are immediately updated in the catalog cache. If Sentry has not been updated and there is a refresh authorization call in between it can clear the privilege from the cache causing the subsequent statement using that privilege to fail. The patch adds sleep statements in the test to work-around these issues. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A
[Impala-ASF-CR] IMPALA-6663 Expose current DDL metrics on WebUI
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13806 ) Change subject: IMPALA-6663 Expose current DDL metrics on WebUI .. Patch Set 7: (2 comments) > (4 comments) > > I think this in general is a very useful addition. Can you also > specify the motivation of why we need to track the total number of > DDLs instead of tracking them on a per table level. Certain > metadata operations like invalidate metadata as indeed global and > cannot be tracked at table. However, I think it useful to track how > many refresh or invalidates were called on a given table and > finding top-n such tables. Similarly for DMLs, instead of a getting > a total count of INSERTS, would it be more useful to get top-N > tables which are inserted into? Took a second pass and realized that you are indeed tracking currently running DDL ops. The question in such a case is how do we determine if seeing 3 resetMetadata operations on 3 small tables is better or worse than 1 resetMetadata operation on a large table. Can you please comment on how should the end user interpret these values to determine the load on the catalog? http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java File fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java: http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@70 PS7, Line 70: decrementDdlTypeCounter > Is decrement ever needed? same for the other counters? you may disregard this comment since I thought you were only tracking the global count of operations instead of running ones. http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl File www/catalog.tmpl: http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl@202 PS7, Line 202: Running > Is this user-visible text? Running seems to suggest these operations are cu I think I misunderstood the patch a little bit. Looks like we are indeed tracking the running operations. You may discard this comment. -- To view, visit http://gerrit.cloudera.org:8080/13806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a Gerrit-Change-Number: 13806 Gerrit-PatchSet: 7 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 13 Aug 2019 17:32:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6663 Expose current DDL metrics on WebUI
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13806 ) Change subject: IMPALA-6663 Expose current DDL metrics on WebUI .. Patch Set 7: (4 comments) I think this in general is a very useful addition. Can you also specify the motivation of why we need to track the total number of DDLs instead of tracking them on a per table level. Certain metadata operations like invalidate metadata as indeed global and cannot be tracked at table. However, I think it useful to track how many refresh or invalidates were called on a given table and finding top-n such tables. Similarly for DMLs, instead of a getting a total count of INSERTS, would it be more useful to get top-N tables which are inserted into? http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java File fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java: http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@41 PS7, Line 41: private AtomicLong resetMetadataCounter_; : : private AtomicLong dmlOperationCounter_; : : private AtomicLong syncDdlOperationCounter_; Can you add a one line comment on each of these counters as to what do they represent? Also, would be great if you could specify what operations constitute a DML in this context. http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@70 PS7, Line 70: decrementDdlTypeCounter Is decrement ever needed? same for the other counters? http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@109 PS7, Line 109: getResetMetadataCounter By returning the AtomicLong object this class is leaking the private field and it is subject to modification by other classes. A more common pattern is to return its value (return type long). http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl File www/catalog.tmpl: http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl@202 PS7, Line 202: Running Is this user-visible text? Running seems to suggest these operations are currently being run. -- To view, visit http://gerrit.cloudera.org:8080/13806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a Gerrit-Change-Number: 13806 Gerrit-PatchSet: 7 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 13 Aug 2019 17:24:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14024 ) Change subject: IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd Gerrit-Change-Number: 14024 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 12 Aug 2019 17:14:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8627: [WIP] Enable catalog-v2 in tests
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13933 ) Change subject: IMPALA-8627: [WIP] Enable catalog-v2 in tests .. IMPALA-8627: [WIP] Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Testing is in progress. Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d --- M docker/catalogd/Dockerfile M docker/impalad_coord_exec/Dockerfile M docker/impalad_coordinator/Dockerfile M tests/query_test/test_observability.py 4 files changed, 39 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/3 -- To view, visit http://gerrit.cloudera.org:8080/13933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Gerrit-Change-Number: 13933 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 8 concurrent Hive clients and each of the client runs 100 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. The test already found a bug which is caused when a Hive issues a alter table add if not exists partition" query and the partition is not really added since it is preexisting. In such a case the ADD_PARTITION events is still generated but with a empty list of partitions in the events. Such events are now ignored. Notes: 1. Ran the test with defaults. It generates about 2100 events and runs for close to 15 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 8 files changed, 1,544 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/4 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Hello Bharath Vissapragada, Quanlong Huang, Anurag Mantripragada, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13932 to look at the new patch set (#5). Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 8 concurrent Hive clients and each of the client runs 100 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. The test already found a bug which is caused when a Hive issues a alter table add if not exists partition" query and the partition is not really added since it is preexisting. In such a case the ADD_PARTITION events is still generated but with a empty list of partitions in the events. Such events are now ignored. Notes: 1. Ran the test with defaults. It generates about 2100 events and runs for close to 15 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 8 files changed, 1,578 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/5 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 8 concurrent Hive clients and each of the client runs 100 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. The test already found a bug which is caused when a Hive issues a alter table add if not exists partition" query and the partition is not really added since it is preexisting. In such a case the ADD_PARTITION events is still generated but with a empty list of partitions in the events. Such events are now ignored. Notes: 1. Ran the test with defaults. It generates about 2100 events and runs for close to 15 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A fe/src/test/java/org/apache/impala/catalog/events/RandomHiveQueryRunner.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java 8 files changed, 1,516 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/3 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8627: [WIP] Enable catalog-v2 in tests
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13933 ) Change subject: IMPALA-8627: [WIP] Enable catalog-v2 in tests .. IMPALA-8627: [WIP] Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Also, fixes the test_observability which fails on catalog-v2 since the profile emits different metadata load events. Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d --- M docker/catalogd/Dockerfile M docker/impalad_coord_exec/Dockerfile M docker/impalad_coordinator/Dockerfile M tests/query_test/test_observability.py 4 files changed, 41 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/4 -- To view, visit http://gerrit.cloudera.org:8080/13933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Gerrit-Change-Number: 13933 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8627: Enable catalog-v2 in tests
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/13933 ) Change subject: IMPALA-8627: Enable catalog-v2 in tests .. IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Also, fixes the test_observability which fails on catalog-v2 since the profile emits different metadata load events. Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d --- M docker/catalogd/Dockerfile M docker/impalad_coord_exec/Dockerfile M docker/impalad_coordinator/Dockerfile M tests/query_test/test_observability.py 4 files changed, 41 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/5 -- To view, visit http://gerrit.cloudera.org:8080/13933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Gerrit-Change-Number: 13933 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8627: Enable catalog-v2 in tests
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/13933 ) Change subject: IMPALA-8627: Enable catalog-v2 in tests .. IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Test fixes: 1. Modified test_observability which fails on catalog-v2 since the profile emits different metadata load events. The test now looks for the right events on the profile depending on whether catalogv2 is enabled or not. 2. Fixes a bug in TableName.java constructor which allows non-lowercased table and database names. This causes problems at the local catalog cache which expects the tablenames to be always in lowercase. More details on this failure are available in IMPALA-8627. 3. Fixes the JdbcTest which checks for existence of table comment in the getTables metadata jdbc call. In catalog-v2 since the columns are not requested, LocalTable is not loaded and hence the test needs to be modified to check if catalog-v2 is enabled. Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d --- M docker/catalogd/Dockerfile M docker/impalad_coord_exec/Dockerfile M docker/impalad_coordinator/Dockerfile M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java M fe/src/test/java/org/apache/impala/service/JdbcTest.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/workloads/functional-query/queries/QueryTest/describe-db.test M tests/common/environ.py M tests/hs2/hs2_test_suite.py M tests/hs2/test_hs2.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_observability.py 14 files changed, 136 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/9 -- To view, visit http://gerrit.cloudera.org:8080/13933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Gerrit-Change-Number: 13933 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8841: Try to fix Tez related dataload flakiness
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14081 ) Change subject: IMPALA-8841: Try to fix Tez related dataload flakiness .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/14081/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14081/2//COMMIT_MSG@10 PS2, Line 10: initializing Tez resources in parallel my observation is that the tez resources are allocated per session and since every hive query which is run during dataload happens in a separate beeline session, it may still need to initialize the resources. Right? http://gerrit.cloudera.org:8080/#/c/14081/2/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/14081/2/testdata/bin/create-load-data.sh@620 PS2, Line 620: before before typo -- To view, visit http://gerrit.cloudera.org:8080/14081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id21d57483fe7a4f72f450fb71f8f53b3c1ef6327 Gerrit-Change-Number: 14081 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Anonymous Coward (521) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 16 Aug 2019 16:35:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8793: Implement TRUNCATE for insert-only ACID tables
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14071 ) Change subject: IMPALA-8793: Implement TRUNCATE for insert-only ACID tables .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java File fe/src/main/java/org/apache/impala/catalog/Transaction.java: http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@17 PS2, Line 17: Transaction Does this class need to be thread-safe? I can see there are possible races with respect to the transactionId_ being used as the flag to track if this transaction is open or close. http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@20 PS2, Line 20: transactionId_ can we make this final? I see that you are using this to keep track of whether the transaction is open or close. May be you can use a AtomicBoolean to keep track of that separately so that commit() can be made thread-safe. http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@38 PS2, Line 38: checkState(transactionId_ > 0) Does this need to throw an exception? What is the behavior from HMS if a client calls commit on a already committed transaction? http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@46 PS2, Line 46: transactionId_ nit, can we change this to if (transactionId_ < 0) return; makes it easier to read in my opinion. http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1821 PS2, Line 1821: COLUMN_STATS_ACCURATE Its unclear why we are clearing this property. Can you add a comment explaining the same? -- To view, visit http://gerrit.cloudera.org:8080/14071 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic749b7f27da157e1c0ebf9b7e9b6ee09afad122a Gerrit-Change-Number: 14071 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Dinesh Garg (430) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 16 Aug 2019 17:22:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8841: Try to fix Tez related dataload flakiness
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14081 ) Change subject: IMPALA-8841: Try to fix Tez related dataload flakiness .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id21d57483fe7a4f72f450fb71f8f53b3c1ef6327 Gerrit-Change-Number: 14081 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Anonymous Coward (521) Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 16 Aug 2019 18:45:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14082 to look at the new patch set (#2). Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3 .. IMPALA-8847: Fix test_event_processing.py on Hive-3 The test_event_processing fails on Hive-3 due to a error in the syntax of the hive query for transactional tables. It was missed unfortunately when IMPALA-8847 was merged. Testing done: Ran the test on both Hive-2 and Hive-3 and it passes now Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 --- M tests/custom_cluster/test_event_processing.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14082/2 -- To view, visit http://gerrit.cloudera.org:8080/14082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 Gerrit-Change-Number: 14082 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Fix test event processing.py on Hive-3
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14082 Change subject: Fix test_event_processing.py on Hive-3 .. Fix test_event_processing.py on Hive-3 The test_event_processing fails on Hive-3 due to a error in the syntax of the hive query for transactional tables. It was missed unfortunately when IMPALA-8847 was merged Testing done: Ran the test on both Hive-2 and Hive-3 and it passes now Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 --- M tests/custom_cluster/test_event_processing.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14082/1 -- To view, visit http://gerrit.cloudera.org:8080/14082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 Gerrit-Change-Number: 14082 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. IMPALA-8847: Ignore add partition events with empty partition list Certain Hive queries like "alter table add if not exists partition ()" generate a add_partition event even if the partition did not really exists. Such events have a empty partition list in the event message which trips on the Precondition check in the AddPartitionEvent. This causes event processor to go into error state. The only way to recover is to issue invalidate metadata in such a case. The patch adds logic to ignore such events. Testing: 1. Added a test case which reproduces the issue. The test case works after the patch is applied. Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Reviewed-on: http://gerrit.cloudera.org:8080/14049 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_event_processing.py A tests/util/event_processor_utils.py 3 files changed, 162 insertions(+), 10 deletions(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 6 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. Patch Set 5: Code-Review+2 Carrying forward Anurag’s +1. I suspect the previous test failure is caused due to a race condition in the test itself. I will investigate it separately. This change is pretty simple and its scope is limited to event processor. I am going to go ahead and merge the change by self-reviewing in addition to Anurag’s +1 -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Aug 2019 07:00:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8862: Don't ship jetty and ant
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14068 ) Change subject: IMPALA-8862: Don't ship jetty and ant .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14068 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c50da957e3704c919b391ecd03c26ed851bdaf Gerrit-Change-Number: 14068 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Aug 2019 16:48:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14043 ) Change subject: IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG@14 PS7, Line 14: This change is Step 2 in a series of steps to coordinate the : introduction of HMS API changes to Hive3 and Impala. Step 4 is also : Impala related, it will be covered in IMPALA-8842 part 2. : : The steps are as follows: : : 1. Change in Hive3. : We push new APIs so Impala can use them. New APIs will simply call old : existing methods so there should not be any change of functionality : there. Since there were many incompatible changes, new APIs are tagged : method_name_V2. : : 2. Change in Impala. : Push changes to use new methods *V2. : : 3. Change in Hive3. : Push patch with complete functionality. *V2 methods contains the new : logic. The old existing methods are not used anymore by Impala at this : point, hence they can be removed. For every method_name_V2, I will : create a corresponding method method_name that calls the former one. : : 4. Change in Impala : Replace *V2 calls by *. : : 5. Change in Hive3. : Remove *V2 methods from API I am not sure why we need this in the commit message. The commit message should give details what the patch does and why. The coordination of patches between Hive and Impala can be done out-of-band and provides no value by recording the steps in the git commit message. http://gerrit.cloudera.org:8080/#/c/14043/7/shaded-deps/pom.xml File shaded-deps/pom.xml: http://gerrit.cloudera.org:8080/#/c/14043/7/shaded-deps/pom.xml@43 PS7, Line 43: : : org.apache.hive.shims : hive-shims-0.20 : : Is there a particular reason why this needs to be excluded for this patch? -- To view, visit http://gerrit.cloudera.org:8080/14043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a73f5eeac8e84d63b22aaed5dfbcd8ea39f0af4 Gerrit-Change-Number: 14043 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Aug 2019 17:20:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14043 ) Change subject: IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG@14 PS7, Line 14: This change is Step 2 in a series of steps to coordinate the : introduction of HMS API changes to Hive3 and Impala. Step 4 is also : Impala related, it will be covered in IMPALA-8842 part 2. : : The steps are as follows: : : 1. Change in Hive3. : We push new APIs so Impala can use them. New APIs will simply call old : existing methods so there should not be any change of functionality : there. Since there were many incompatible changes, new APIs are tagged : method_name_V2. : : 2. Change in Impala. : Push changes to use new methods *V2. : : 3. Change in Hive3. : Push patch with complete functionality. *V2 methods contains the new : logic. The old existing methods are not used anymore by Impala at this : point, hence they can be removed. For every method_name_V2, I will : create a corresponding method method_name that calls the former one. : : 4. Change in Impala : Replace *V2 calls by *. : : 5. Change in Hive3. : Remove *V2 methods from API > I also wanted to explain why there's no change in functionality yet. I am okay with this. This should not block the commit. -- To view, visit http://gerrit.cloudera.org:8080/14043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a73f5eeac8e84d63b22aaed5dfbcd8ea39f0af4 Gerrit-Change-Number: 14043 Gerrit-PatchSet: 7 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Aug 2019 18:24:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 4 concurrent Hive clients and each of the client runs 50 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. Notes: 1. Ran the test with defaults. It generates about 500 events and runs for close to 4.5 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 6 files changed, 1,590 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/8 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 8 concurrent Hive clients and each of the client runs 100 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. Notes: 1. Ran the test with defaults. It generates about 2100 events and runs for close to 15 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 6 files changed, 1,590 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/7 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 4 concurrent Hive clients and each of the client runs 50 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. Notes: 1. Ran the test with defaults. It generates about 500 events and runs for close to 4.5 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 6 files changed, 1,598 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/9 -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. IMPALA-8847: Ignore add partition events with empty partition list Certain Hive queries like "alter table add if not exists partition ()" generate a add_partition event even if the partition did not really exists. Such events have a empty partition list in the event message which trips on the Precondition check in the AddPartitionEvent. This causes event processor to go into error state. The only way to recover is to issue invalidate metadata in such a case. The patch adds logic to ignore such events. Testing: 1. Added a test case which reproduces the issue. The test case works after the patch is applied. Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_event_processing.py A tests/util/event_processor_utils.py 3 files changed, 162 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/4 -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py File tests/custom_cluster/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@181 PS3, Line 181: def wait_for_insert_event_processing(self, previous_event_id): > I like that you moved out EventProcessorUtils. However, this file heavily u yes, I agree. I am planning to clean it up as part of IMPALA-8795. This method had some custom logic in place where it was waiting for 2 events so didn't want to touch that to keep the scope minimum. http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@200 PS3, Line 200: get_event_processor_metrics > Same as my above comment, this function can be removed here and modify this I am planning to clean up all the event polling tests together in IMPALA-8795 http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@215 PS3, Line 215: get_last_synced_event_id > Same as previous comment. I am planning to clean up all the event polling tests together in IMPALA-8795 http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py File tests/util/event_processor_utils.py: http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@27 PS3, Line 27: class EventProcessorUtils(object): > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@30 PS3, Line 30: D > flake8: E303 too many blank lines (2) Done http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@50 PS3, Line 50: . > flake8: E131 continuation line unaligned for hanging indent Done http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@56 PS3, Line 56: @ > flake8: E303 too many blank lines (2) Done -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 12 Aug 2019 21:55:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14024 ) Change subject: IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus .. IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus Previous commits for IMPALA-8766 attempted to use hadoop-cloud-storage to satisfy Impala's cloud dependencies (e.g. hadoop-aws, hadoop-azure, etc). On builds with USE_CDP_HIVE=true, this adds Knox gateway-cloud-bindings. However, the entry for hadoop-cloud-storage artifact in the impala.cdp.repo maven repository introduces dependencies that are external to that repository. This requires the HWX Nexus repository to resolve those dangling dependencies. Unfortunately, HWX Nexus ages out old jars, including the ones we need. This stops using hadoop-cloud-storage, and instead adds a direct dependency to Knox for USE_CDP_HIVE=true. It disables the HWX Nexus repository and leaves a tombstone explaining why. Testing: - Deleted my .m2 directory and rebuilt Impala with USE_CDP_HIVE=true - Verified the CLASSPATH still contains the right jars on USE_CDP_HIVE=true Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd Reviewed-on: http://gerrit.cloudera.org:8080/14024 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M bin/impala-config.sh M fe/pom.xml M impala-parent/pom.xml 3 files changed, 48 insertions(+), 49 deletions(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd Gerrit-Change-Number: 14024 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/14049/2/tests/custom_cluster/test_event_processing.py File tests/custom_cluster/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/14049/2/tests/custom_cluster/test_event_processing.py@156 PS2, Line 156: > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py File tests/util/event_processor_utils.py: http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@22 PS2, Line 22: import logging > flake8: F401 'logging' imported but unused Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@28 PS2, Line 28: class EventProcessorUtils(object): > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@31 PS2, Line 31: @ > flake8: E301 expected 1 blank line, found 0 Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@47 PS2, Line 47: T > flake8: F821 undefined name 'TimeoutError' Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@49 PS2, Line 49: . > flake8: E131 continuation line unaligned for hanging indent Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@53 PS2, Line 53: > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@62 PS2, Line 62: > line has trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 12 Aug 2019 21:10:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/14049 ) Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. IMPALA-8847: Ignore add partition events with empty partition list Certain Hive queries like "alter table add if not exists partition ()" generate a add_partition event even if the partition did not really exists. Such events have a empty partition list in the event message which trips on the Precondition check in the AddPartitionEvent. This causes event processor to go into error state. The only way to recover is to issue invalidate metadata in such a case. The patch adds logic to ignore such events. Testing: 1. Added a test case which reproduces the issue. The test case works after the patch is applied. Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_event_processing.py A tests/util/event_processor_utils.py 3 files changed, 163 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/3 -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14049 Change subject: IMPALA-8847: Ignore add partition events with empty partition list .. IMPALA-8847: Ignore add partition events with empty partition list Certain Hive queries like "alter table add if not exists partition ()" generate a add_partition event even if the partition did not really exists. Such events have a empty partition list in the event message which trips on the Precondition check in the AddPartitionEvent. This causes event processor to go into error state. The only way to recover is to issue invalidate metadata in such a case. The patch adds logic to ignore such events. Testing: 1. Added a test case which reproduces the issue. The test case works after the patch is applied. Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_event_processing.py A tests/util/event_processor_utils.py 3 files changed, 162 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/2 -- To view, visit http://gerrit.cloudera.org:8080/14049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb Gerrit-Change-Number: 14049 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in tests
Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/13922 ) Change subject: IMPALA-8795 : Enable event polling by default in tests .. IMPALA-8795 : Enable event polling by default in tests Most of the fixes for event processing are committed. This change enables the feature by default for all the tests so that eventually we can turn it on out of the box. Testing done [WIP]: 1. Ran all the tests with the patch and confirmed that there are no test failures 2. Ran all the tests with CDP build. Change-Id: I7279349d4900e24fbcf558f290549496844ce138 --- M docker/catalogd/Dockerfile M tests/common/environ.py M tests/common/impala_test_suite.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_refresh_partition.py M tests/util/event_processor_utils.py 9 files changed, 155 insertions(+), 167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13922/8 -- To view, visit http://gerrit.cloudera.org:8080/13922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7279349d4900e24fbcf558f290549496844ce138 Gerrit-Change-Number: 13922 Gerrit-PatchSet: 8 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14082 ) Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3 .. IMPALA-8847: Fix test_event_processing.py on Hive-3 The test_event_processing fails on Hive-3 due to a error in the syntax of the hive query for transactional tables. It was missed unfortunately when IMPALA-8847 was merged. Testing done: Ran the test on both Hive-2 and Hive-3 and it passes now Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 Reviewed-on: http://gerrit.cloudera.org:8080/14082 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M tests/custom_cluster/test_event_processing.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 Gerrit-Change-Number: 14082 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. Patch Set 9: Code-Review+2 carrying forward Bharath's +2 from earlier. -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 9 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 16 Aug 2019 22:14:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13932 ) Change subject: IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor .. IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor This change adds a new stress test for MetastoreEventsProcessor. This test randomly executes hive queries to generate a lot of events. The event processor is invoked at random intervals so that a variable batch of events is processed everytime. After each batch is processed, the test checks the status of events processor. By default, on CDH builds the test is configured to run with 4 concurrent Hive clients and each of the client runs 50 random Hive queries. These defaults can be overridden by passing system properties using maven command arguments "-DnumClients" and "-DnumQueriesPerClients". Additionally, the test also creates impala clients which keep issuing refresh table commands on the test databases to make sure that eventProcessor is doing some real work rather than invalidating/refreshing tables which are already incomplete. This test is added as a junit test and uses Hive JDBC to issue the sqls. This is much faster than the end-to-end python test which issues each hive query in a separate beeline sessions which re-establishes the connection every time. Notes: 1. Ran the test with defaults. It generates about 500 events and runs for close to 4.5 min. This can be changed to a lower value if we see significant increased delay in the test job runtimes. 3. On CDP builds the concurrent hive queries run very slow due to container provisioning time on the minicluster. I have left this as a TODO to investigate. The test runs in single threaded mode with increased number of queries when running against Hive-3 Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Reviewed-on: http://gerrit.cloudera.org:8080/13932 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java 6 files changed, 1,598 insertions(+), 24 deletions(-) Approvals: Impala Public Jenkins: Verified Vihang Karajgaonkar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce Gerrit-Change-Number: 13932 Gerrit-PatchSet: 10 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14082 ) Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3 .. Patch Set 2: Code-Review+2 Carrying forward Csaba's +2 from before -- To view, visit http://gerrit.cloudera.org:8080/14082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758 Gerrit-Change-Number: 14082 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 16 Aug 2019 22:16:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8880: Disable EventsProcessorStressTest
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14117 Change subject: IMPALA-8880: Disable EventsProcessorStressTest .. IMPALA-8880: Disable EventsProcessorStressTest EventsProcessorStressTest is flaky. The root-cause of this is MAPREDUCE-6441 which is unavailable in the component builds we use in the minicluster. The test will be renabled once we port MAPREDUCE-6441 to Hive-3 and Hive-2 minicluster Testing: 1. Ran the test on hive-2 and confirmed its being ignored Change-Id: I0be247e01d5047dbd74702fb43d4d038d8ccf758 --- M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/14117/2 -- To view, visit http://gerrit.cloudera.org:8080/14117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0be247e01d5047dbd74702fb43d4d038d8ccf758 Gerrit-Change-Number: 14117 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] WIP IMPALA-8851 : Do not throw authorization exception in drop if exists queries
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14121 ) Change subject: WIP IMPALA-8851 : Do not throw authorization exception in drop if exists queries .. Patch Set 3: (8 comments) Mostly looks good. Have some questions and left some suggestions below. http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2654 PS3, Line 2654: public boolean tableExists(TableName tblName) { Missing java doc for the two new methods http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java File fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java: http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java@77 PS3, Line 77: dbName_ What happens when we register a privilege for a db which doesn't exist? The initial thought was that user needs to have ANY privilege on server to list databases http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java File fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java: http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@86 PS3, Line 86: registerFnPriv Is this registering the priv twice if the function exists and if exists clause it not provided? http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@97 PS3, Line 97: !hasSignature() && db != null && db.getFunctions( : desc_.functionName()).isEmpty() Can we make this section cleaner by creating a method which checks for existance of the function Eg. boolean fnExists = doesFunctionExist(db, Reference errMsg); if (!fnExists && !ifExists_) { // fucntion does not exist and if exists is not provided // throw new AnalysisException(errMsg.get()); } else if(fnExists && ifExists) { registerFnPriv(analyzer, Privilege.DROP); } else if (!fnExists && ifExists) { registerFnPriv(analyzer, Privilege.ANY); } http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@119 PS3, Line 119: void this method can be private. http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@116 PS3, Line 116: onTable(dbName_, getTbl()) What is the behavior when the table doesn't exist and a privilege request is registered on that object? http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py File tests/authorization/test_owner_privileges.py: http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py@149 PS3, Line 149: test_db is this argument needed? http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py@182 PS3, Line 182: _execute_drop_if_exists Do you think we need a additional test which exercises the drop if exists from a user which has the required permissions and one which was not a owner of the object previously? -- To view, visit http://gerrit.cloudera.org:8080/14121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a Gerrit-Change-Number: 14121 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 26 Aug 2019 18:36:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8579: Ignore trivial alter table/partition events.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14145 ) Change subject: IMPALA-8579: Ignore trivial alter table/partition events. .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@944 PS1, Line 944: isTrivialAlterTableEvent may be a more boring name like canBeSkipped()? :) http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@945 PS1, Line 945: it is a trivial alter event this log could be more descriptive since most likely a reader of that statement would not understand why this event was ignored. May be something like, "Skipping this event since it only alters some table properties which can be ignored". http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1023 PS1, Line 1023: !tableAfter_.equals(tableBefore_) Do we need this check? If not, you can change this to a static util method which takes in two maps for the beforeProperties and afterProperties http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1030 PS1, Line 1030: tableAfter_.getParameters().put(property, : tableBefore_.getParameters().get(property)); I think we should not change the state of the table object from the event. You could either create a copy and operate on it or use a try .. finally block to restore the original values of the table properties. This may be an issue today but we can run into weird bugs in the future if we decide to make use of these objects to update the catalog state directly in the future. http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1368 PS1, Line 1368: getPropertiesToIgnore This list could be a static final Immutable list since we don't expect it to change from event type http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1569 PS1, Line 1569: !partitionAfter_.equals(partitionBefore_) Do we need this check? http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@160 PS1, Line 160: alterPropertiesToIgnore_ Another reason to use a static final constant which can be reused. Such implementations are buggy since the main code could change while this test keeps passing. -- To view, visit http://gerrit.cloudera.org:8080/14145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I01a59d5170accc014f76f14eb526d96ddcf61f76 Gerrit-Change-Number: 14145 Gerrit-PatchSet: 1 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 27 Aug 2019 03:11:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8851: Do not throw authorization exception in drop if exists queries
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14121 ) Change subject: IMPALA-8851: Do not throw authorization exception in drop if exists queries .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/14121/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/14121/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2659 PS7, Line 2659: getFqTableName I would have preferred a precondition check here instead of assuming what the caller wants since based on the result of this call, authorization decisions are being made. http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@117 PS6, Line 117: tableName_ > I created a fix inside tableExists(). Sorry for bugging you on this. Adding the check in the tableExists won't be right if dbName_ here is not default. For instance if dbName_ = foo and tblName_.tblName = testTbl then this call will look up default.testTbl Also, does the case matter. I think it is important to make sure that this call is doing the right thing otherwise a non-privileged user can drop the table. -- To view, visit http://gerrit.cloudera.org:8080/14121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a Gerrit-Change-Number: 14121 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 27 Aug 2019 22:33:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8851: Do not throw authorization exception in drop if exists queries
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14121 ) Change subject: IMPALA-8851: Do not throw authorization exception in drop if exists queries .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2659 PS6, Line 2659: tblName I was trying out your patch locally and I found a bug in this code which can lead to a privilege escalation problem. If the tblName is not fully qualified in this method this will always return false. Can you add a Precondition check to make sure that tblname is fully qualified here? Not sure why it was not caught in the tests. http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@115 PS6, Line 115: dbName_, getTbl() > I think this logic is faulty and will fail in the following scenario. Synced up with Csaba offline. If the user2 has privileges on t2 within the db the show tables command skips listing of t1. Hence a ANY privilege on db is not a suitable representative to know whether user has permissions to see the existance of t1. In such a case, the current approach makes sense to me. Thanks for clarification. http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@117 PS6, Line 117: tableName_ I noticed that the tableName_ here is not fully qualified for some reason. We should pass TableName(dbName, getTbl()) here. -- To view, visit http://gerrit.cloudera.org:8080/14121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a Gerrit-Change-Number: 14121 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 27 Aug 2019 21:10:43 + Gerrit-HasComments: Yes