[Impala-ASF-CR] IMPALA-7608: Estimate row count from file size when no stats available
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12974 ) Change subject: IMPALA-7608: Estimate row count from file size when no stats available .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2876/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a Gerrit-Change-Number: 12974 Gerrit-PatchSet: 4 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 05:09:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13094 ) Change subject: IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2875/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icdccde7c32e52ed1b0c7b13a22171e8bcd7f1f2d Gerrit-Change-Number: 13094 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 04:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7608: Estimate row count from file size when no stats available
Fang-Yu Rao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12974 Change subject: IMPALA-7608: Estimate row count from file size when no stats available .. IMPALA-7608: Estimate row count from file size when no stats available Added the feature that computes an estimated number of rows in the current hdfs table if the statistics for the cardinality of the current hdfs table is not available. Also added an additional flag to revert the change in case of regression. Testing: (i) In CardinalityTest.java, replaced the original statement "verifyCardinality("SELECT a FROM functional.tinytable", -1);" with "verifyCardinality("SELECT a FROM functional.tinytable", 1);". (ii) In CardinalityTest.java, added a new test "testBasicsWithoutStatsWithNumRowsEstDisabled()" to ensure that the returned cardinality is still -1 when this feature is disabled. (iii) Corrected the corresponding expected results of PlannerTest. (iv) Corrected the corresponding expected result of QueryTest. Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test 13 files changed, 414 insertions(+), 352 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/12974/4 -- To view, visit http://gerrit.cloudera.org:8080/12974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a Gerrit-Change-Number: 12974 Gerrit-PatchSet: 4 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Make infra/python compatible with both Python 2 & 3
Akshesh Doshi has posted comments on this change. ( http://gerrit.cloudera.org:8080/13070 ) Change subject: Make infra/python compatible with both Python 2 & 3 .. Patch Set 2: Thanks for reviewing this Tim. I agree with your comments and will incorporate them (probably till tomorrow). -- To view, visit http://gerrit.cloudera.org:8080/13070 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4285a021bb581f88425daa52ef8a3f844017d82 Gerrit-Change-Number: 13070 Gerrit-PatchSet: 2 Gerrit-Owner: Akshesh Doshi Gerrit-Reviewer: Akshesh Doshi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 04:23:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13094 ) Change subject: IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/13094/1/tests/hs2/test_hs2.py File tests/hs2/test_hs2.py: http://gerrit.cloudera.org:8080/#/c/13094/1/tests/hs2/test_hs2.py@542 PS1, Line 542: ; flake8: E703 statement ends with a semicolon http://gerrit.cloudera.org:8080/#/c/13094/1/tests/hs2/test_hs2.py@553 PS1, Line 553: ] flake8: E501 line too long (91 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/13094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icdccde7c32e52ed1b0c7b13a22171e8bcd7f1f2d Gerrit-Change-Number: 13094 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 04:05:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call.
Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13094 Change subject: IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call. .. IMPALA-1856: Missing datatypes from JDBC DBMD.getTypeInfo() call. Updated MetadataOp::createGetTypeInfoResults to include all supported and externally visible data types, including complex types (which were missing along with some other Primitive types). Added a new function Type::isInternalType() to identify internal types such as NULL_TYPE, FIXED_UDA_INTERMEDIATE. These types are not exposed through getTypeInfo function. Testing: - Updated FrontedTest.java to ensure that the result set from MetadataOp::getTypeInfo contains all supported and externally visible types - Added new E2E test (test_get_type_info) in tests/hs2/test_hs2.py. The new test validates that the HS2 GetTypeInfo() RPC returns supported and externally visible types. Change-Id: Icdccde7c32e52ed1b0c7b13a22171e8bcd7f1f2d --- M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M fe/src/main/java/org/apache/impala/catalog/Type.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java M tests/hs2/test_hs2.py 5 files changed, 144 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/13094/1 -- To view, visit http://gerrit.cloudera.org:8080/13094 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icdccde7c32e52ed1b0c7b13a22171e8bcd7f1f2d Gerrit-Change-Number: 13094 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13093 ) Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 24 Apr 2019 03:52:12 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13093 ) Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Fix critique-gerrit-review to ignore shell/ext-py See https://gerrit.cloudera.org/#/c/12884/ for an example of the bot misbehaving Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Reviewed-on: http://gerrit.cloudera.org:8080/13093 Reviewed-by: Fredy Wijaya Tested-by: Impala Public Jenkins --- M bin/jenkins/critique-gerrit-review.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Fredy Wijaya: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 24 Apr 2019 03:36:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8444: Fix performance regression when building privilege name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13095 ) Change subject: IMPALA-8444: Fix performance regression when building privilege name .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2873/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 Gerrit-Change-Number: 13095 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 24 Apr 2019 02:18:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2874/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Apr 2019 02:17:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8444: Fix performance regression when building privilege name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13095 ) Change subject: IMPALA-8444: Fix performance regression when building privilege name .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2872/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 Gerrit-Change-Number: 13095 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 24 Apr 2019 02:13:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/13005/5/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/5/bin/impala-config.sh@196 PS5, Line 196: # When USE_CDP_HIVE is set we use the latest hive version available to deply in minicluster line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/13005/5/bin/impala-config.sh@200 PS5, Line 200: # TODO(Vihang) we should repackage the tarballs so that the src and binaries are extracted line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/13005/5/bin/impala-config.sh@209 PS5, Line 209: export HIVE_HOME="$IMPALA_TOOLCHAIN/cdh_components-${CDH_BUILD_NUMBER}/hive-${MINICLUSTER_HIVE_VERSION}" line too long (106 > 90) http://gerrit.cloudera.org:8080/#/c/13005/5/bin/impala-config.sh@546 PS5, Line 546: export HIVE_METASTORE_THRIFT_DIR=$CDP_COMPONENTS_HOME/apache-hive-${IMPALA_HIVE_VERSION}-src/standalone-metastore/src/main/thrift line too long (129 > 90) http://gerrit.cloudera.org:8080/#/c/13005/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/13005/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1073 PS5, Line 1073: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/13005/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1169 PS5, Line 1169: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/13005/5/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/13005/5/testdata/bin/run-hive-server.sh@66 PS5, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-verbose:class -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=30010" line too long (121 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Apr 2019 02:07:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change upgrades the hive dependencies of Impala to use Hive 3.1.0 based binaries. Most of the changes in this patch are based off patches provided by Todd (links available in JIRA). Upgrading the dependencies allows us to work with both Hive 3.1.0 and Hive 2.1.0 in the same code line. In order to do this, the patch trims down a lot of unnecessary hive dependencies of the front end code by creating a shaded-deps module. The pom.xml of shaded-deps includes only the files from Hive source which Impala depends for compilation. Additionally, it also uses a custom build of Hive which is based of Hive 3.1.0. This custom build includes patches for HIVE-21596 and HIVE-21586 which are needed by Impala so that it can compile against Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these patches are merged we can get rid of this custom build and rely on more official sources of Hive builds. The patch also changes impala-config.sh so that it always downloads the Hive-3 libraries from the toolchain. The code is always built using Hive-3 jars. However, based on the value of USE_CDP_HIVE, the minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala implements HiveServer2's TCLIService.thrift interface, it requires us to use the existing mechanism of copying the hive-2/api TCLIService.thrift to hive-3/api. It also adds a few environment variables which point to the metastore's thrift file and the CDH Hive version. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596 2. The Hive 3.1 does not include some of the patches like HIVE-21077. Some of the recent code in Impala depends on that patch to exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to compile against Hive 3.1.0 jars. I will continue working on getting this merged along with HIVE-21586 Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 --- M CMakeLists.txt M README.md M bin/bootstrap_toolchain.py M bin/impala-config.sh M bin/set-classpath.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/service/MetadataOp.java A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M impala-parent/pom.xml A shaded-deps/.gitignore A shaded-deps/CMakeLists.txt A shaded-deps/dependency-reduced-pom.xml A shaded-deps/pom.xml M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_permanent_udfs.py 33 files changed, 1,224 insertions(+), 258 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/5 -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7957: Fix slot equivalences may be enforced multiple times
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13051 ) Change subject: IMPALA-7957: Fix slot equivalences may be enforced multiple times .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida2d5d8149b217e18ebae61e136848162503653e Gerrit-Change-Number: 13051 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 01:56:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8444: Fix performance regression when building privilege name
Fredy Wijaya has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13095 ) Change subject: IMPALA-8444: Fix performance regression when building privilege name .. IMPALA-8444: Fix performance regression when building privilege name This patch fixes the performance regression when building privilege name by rewriting PrincipalPrivilege.buildPrivilegeName() with a simple string concatentation instead of using a list that gets converted into a string. Below is the result of running a benchmark using JMH comparing the old and new implementations: Result "org.apache.impala.PrivilegeNameBenchmark.fast": ≈ 10⁻³ ms/op Result "org.apache.impala.PrivilegeNameBenchmark.slow": 0.001 ±(99.9%) 0.001 ms/op [Average] (min, avg, max) = (0.001, 0.001, 0.001), stdev = 0.001 CI (99.9%): [0.001, 0.001] (assumes normal distribution) BenchmarkMode Cnt ScoreError Units PrivilegeNameBenchmark.fast avgt 25 ≈ 10⁻³ ms/op PrivilegeNameBenchmark.slow avgt 25 0.001 ± 0.001 ms/op This patch also updates SentryAuthorizationPolicy.listPrivileges() to reuse the privilege names that have already been built instead of building them again. While fixing this, I found a bug where Principal stores the PrincipalPrivilege in case insensitive way. This is true for all privilege scopes, except URI. This patch fixes the issue by storing URI privilege in a case sensitive manner. This patch removes the synchronized keyword in SentryAuthorizationPolicy.listPrivileges() in order to not cause the operation to run in serial in a highly concurrent workload. This has a trade off of stale authorization metadata, which may be acceptable to get a better performance. Testing: - Ran all FE tests - Ran all E2E authorization tests - Added E2E test for URI privilege bug Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 --- M fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java M fe/src/main/java/org/apache/impala/catalog/Principal.java M fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java M tests/authorization/test_grant_revoke.py 6 files changed, 121 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/13095/2 -- To view, visit http://gerrit.cloudera.org:8080/13095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 Gerrit-Change-Number: 13095 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPMALA-8444: Fix performance regression when building privilege name
Fredy Wijaya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13095 Change subject: IMPMALA-8444: Fix performance regression when building privilege name .. IMPMALA-8444: Fix performance regression when building privilege name This patch fixes the performance regression when building privilege name by rewriting PrincipalPrivilege.buildPrivilegeName() with a simple string concatentation instead of using a list that gets converted into a string. Below is the result of running a benchmark using JMH comparing the old and new implementations: Result "org.apache.impala.PrivilegeNameBenchmark.fast": ≈ 10⁻³ ms/op Result "org.apache.impala.PrivilegeNameBenchmark.slow": 0.001 ±(99.9%) 0.001 ms/op [Average] (min, avg, max) = (0.001, 0.001, 0.001), stdev = 0.001 CI (99.9%): [0.001, 0.001] (assumes normal distribution) BenchmarkMode Cnt ScoreError Units PrivilegeNameBenchmark.fast avgt 25 ≈ 10⁻³ ms/op PrivilegeNameBenchmark.slow avgt 25 0.001 ± 0.001 ms/op This patch also updates SentryAuthorizationPolicy.listPrivileges() to reuse the privilege names that have already been built instead of building them again. While fixing this, I found a bug where Principal stores the PrincipalPrivilege in case insensitive way. This is true for all privilege scopes, except URI. This patch fixes the issue by storing URI privilege in a case sensitive manner. This patch removes the synchronized keyword in SentryAuthorizationPolicy.listPrivileges() in order to not cause the operation to run in serial in a highly concurrent workload. This has a trade off of stale authorization metadata, which may be acceptable to get a better performance. Testing: - Ran all FE tests - Ran all E2E authorization tests - Added E2E test for URI privilege bug Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 --- M fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java M fe/src/main/java/org/apache/impala/catalog/Principal.java M fe/src/main/java/org/apache/impala/catalog/PrincipalPrivilege.java M tests/authorization/test_grant_revoke.py 6 files changed, 121 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/13095/1 -- To view, visit http://gerrit.cloudera.org:8080/13095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I942d9b55f07c8972f69e532567d9b7d80fceb6e5 Gerrit-Change-Number: 13095 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya
[Impala-ASF-CR] IMPALA-7957: Fix slot equivalences may be enforced multiple times
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/13051 ) Change subject: IMPALA-7957: Fix slot equivalences may be enforced multiple times .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13051/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/13051/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1131 PS2, Line 1131: we : // need to make sure that equivalences are not enforced multiple times Actually, I got the idea from this comment and found that it should be enforced in other places too. Hope my provement is correct. http://gerrit.cloudera.org:8080/#/c/13051/2/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/13051/2/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1760 PS2, Line 1760: > Sure Done -- To view, visit http://gerrit.cloudera.org:8080/13051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida2d5d8149b217e18ebae61e136848162503653e Gerrit-Change-Number: 13051 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Apr 2019 01:10:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7957: Fix slot equivalences may be enforced multiple times
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13051 to look at the new patch set (#3). Change subject: IMPALA-7957: Fix slot equivalences may be enforced multiple times .. IMPALA-7957: Fix slot equivalences may be enforced multiple times Predicates can be divided into three types according to the way they are generated: 1) origin predicates that come from the query 2) auxiliary equal predicates generated for equivalence between a label(alias) and its real expression 3) inferred predicates that inferred from the slot equivalences graph The slot equivalences graph (valueTransferGraph in Analyzer) is generated by the first two kinds of predicates. Analyzer will create equivalence predicates for a PlanNode based on the unassigned predicates and the valueTransferGraph. However, the current implementation can't avoid creating inferred predicates that are duplicated with previously created inferred predicates if they have been assigned before. Duplicated inferred predicates are either redundant or wrong. Say, if we create predicate p1: s1 = s2 for the current PlanNode and p1 duplicates with a previously inferred predicate p0: s1 = s2 (same as s2 = s1), we can prove that p1 is redundant or wrong: 1) p0 must have been assigned. Otherwise, p0 will be in the unassigned conjuncts list and p1 won't be created. 2) p0 must have been assigned to an offspring node of the current PlanNode since we create the PlanNodes in a depth first manner. 3) The origin predicates that infer to p0 have been assigned to an offspring node too. Then, rows that should be rejected have been filtered out either by p0 or the origin predicates that infer to p0. What's worse, assigning p1 on top of the origin predicates may wrongly reject rows. Hence, p1 is either redundant or wrong. In this patch, we check the existence of previously inferred equivalence predicates before creating a predicate. Also add some useful TRACE level logs. Tests: * Add tests for UNIONs in inline-view.test * Run all tests locally in CORE exploration strategy Change-Id: Ida2d5d8149b217e18ebae61e136848162503653e --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test 5 files changed, 365 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/13051/3 -- To view, visit http://gerrit.cloudera.org:8080/13051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ida2d5d8149b217e18ebae61e136848162503653e Gerrit-Change-Number: 13051 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Remove references to the $IMPALA HOME/thirdparty directory
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/13092 ) Change subject: Remove references to the $IMPALA_HOME/thirdparty directory .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0 Gerrit-Change-Number: 13092 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 23:39:34 + Gerrit-HasComments: No
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2868/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 23:14:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 4: (3 comments) I don't know that I'll have time to do a fully detailed review, and it looks like we may already have enough eyes on it. The design and APIs look great - it does seem like we want to get it checked in and exercised more soon. http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@61 PS4, Line 61: Testing done: a new BE test was added; core test with cache enabled. Can we add a custom cluster test to sanity check that it works end to end. Myabe this is where we should sanity check metrics too. http://gerrit.cloudera.org:8080/#/c/12987/4//COMMIT_MSG@62 PS4, Line 62: Do we want to consider enabling this by default for the dockerised tests as a next step? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@88 PS4, Line 88: /// - bound number of backing files per partition by consolidating the content of very > Should we just delete old files if we have reached a configurable high numb Yeah this scenario is a bit concerning for me still since it's conceivable that the workloads we test with might have different properties to real ones. E.g. I can imagine a workload might end up with some set of files that are queries frequently enough to stay resident in the cache indefinitely and keep very old files alive. Not sure if Lars' idea is easier than compaction but it is appealing in some ways. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 23:38:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13093 ) Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2870/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 23 Apr 2019 23:24:14 + Gerrit-HasComments: No
[Impala-ASF-CR] Remove references to the $IMPALA HOME/thirdparty directory
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13092 ) Change subject: Remove references to the $IMPALA_HOME/thirdparty directory .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2869/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0 Gerrit-Change-Number: 13092 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 23:18:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 ) Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705 PS4, Line 1705: # IMPALA-8386: test coverage for ORDER BY/LIMIT > To be clear, i didn't think there was a bug in your code, just wanted to ma Yes, I understand that. To flush out bugs, just start an exhaustive test before running the GVO: https://jenkins.impala.io/job/pre-review-test/346/ -- To view, visit http://gerrit.cloudera.org:8080/12939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa Gerrit-Change-Number: 12939 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 23:16:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2867/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 23:14:08 + Gerrit-HasComments: No
[Impala-ASF-CR] [acid] Predicate to test if dir must be included.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 ) Change subject: [acid] Predicate to test if dir must be included. .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/2865/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c Gerrit-Change-Number: 13091 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Tue, 23 Apr 2019 22:56:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 17: Code-Review+1 LGTM once Lars has a final look. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 17 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 23:03:22 + Gerrit-HasComments: No
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2866/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 23:02:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2864/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 22:50:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/8//COMMIT_MSG@34 PS8, Line 34: 'REPORT_EXEC_STATUS_SEND:FAIL@0.1|REPORT_EXEC_STATUS_RECV:FAIL@0.1' Stale ? http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@400 PS8, Line 400: FLAGS_backend_client_rpc_timeout_ms Can you please leave a TODO on rethinking the timeout period for RPCs in general. As we convert more of the RPCs from Thrift To KRPC, this flag's role has changed. Historically, this is used for setting socket timeout with Thrift RPC so that an Impala thread will pop out of the Thrift RPC stack and check for cancellation. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/runtime/query-state.cc@414 PS8, Line 414: spend spent http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@217 PS8, Line 217: "reporting is disabled."); and only the final report is sent. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@218 PS8, Line 218: 180 To avoid regression, let's be more conservative and set it to be at least 300s, if not larger. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@221 PS8, Line 221: The percent padding that a " : "coordinator should wait to recieve a report after --status_report_max_retry_s " : "before deciding that a backend is unresponsive and the query should be cancelled. May be clearer to include the formula you have in the commit message. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@421 PS8, Line 421: } May wanna check if --status_report_cancellation_padding is set to something sane. http://gerrit.cloudera.org:8080/#/c/12299/8/be/src/service/impala-server.cc@2248 PS8, Line 2248: * (1 + FLAGS_status_report_cancellation_padding / 100.0); DCHECK_GT(max_lag_ms, 0); -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 22:34:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/13093 ) Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 23 Apr 2019 22:41:19 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13093 ) Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4055/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 22:35:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13083 to look at the new patch set (#7). Change subject: IMPALA-7290: part 1: clean up shell tests .. IMPALA-7290: part 1: clean up shell tests This sets up the tests to be extensible to test shell in both beeswax and HS2 modes. Testing: * Add test dimension containing only beeswax in preparation for HS2 dimension. * Factor out hardcoded ports. * Add tests for formatting of all types and NULL values. * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Use unique_database for tests that create tables. Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/create-tpcds-testcase-files.sh M tests/beeswax/impala_beeswax.py M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/custom_cluster/test_client_ssl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 12 files changed, 759 insertions(+), 599 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13083/7 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] Fix critique-gerrit-review to ignore shell/ext-py
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13093 Change subject: Fix critique-gerrit-review to ignore shell/ext-py .. Fix critique-gerrit-review to ignore shell/ext-py See https://gerrit.cloudera.org/#/c/12884/ for an example of the bot misbehaving Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 --- M bin/jenkins/critique-gerrit-review.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/13093/1 -- To view, visit http://gerrit.cloudera.org:8080/13093 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I76f954a8985fba2f04be34499d8c8723facd0e27 Gerrit-Change-Number: 13093 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] Remove references to the $IMPALA HOME/thirdparty directory
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13092 Change subject: Remove references to the $IMPALA_HOME/thirdparty directory .. Remove references to the $IMPALA_HOME/thirdparty directory The $IMPALA_HOME/thirdparty directory is a remnant from before Impala was an Apache project. It is obsolete and unused, so this removes code that references this directory. Testing: - Ran core tests Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0 --- M README.md M bin/impala-config.sh M cmake_modules/FindAvro.cmake M cmake_modules/FindGFlags.cmake M cmake_modules/FindGLog.cmake M cmake_modules/FindGTest.cmake M cmake_modules/FindLdap.cmake M cmake_modules/FindLz4.cmake M cmake_modules/FindOrc.cmake M cmake_modules/FindPProf.cmake M cmake_modules/FindRapidJson.cmake M cmake_modules/FindRe2.cmake M cmake_modules/FindSasl.cmake M cmake_modules/FindSnappy.cmake 14 files changed, 27 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/13092/1 -- To view, visit http://gerrit.cloudera.org:8080/13092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0 Gerrit-Change-Number: 13092 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[native-toolchain-CR] Add zstd to toolchain
Tim Armstrong has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/13079 ) Change subject: Add zstd to toolchain .. Add zstd to toolchain This adds the latest release of zstd for the toolchain. Testing: Built locally, confirmed that binary utilities work. Built on all supported platforms. Change-Id: I17f1489c7f3cc5c1585b5472f4b6e910ee10d204 --- M buildall.sh A source/zstd/build.sh 2 files changed, 40 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/79/13079/4 -- To view, visit http://gerrit.cloudera.org:8080/13079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I17f1489c7f3cc5c1585b5472f4b6e910ee10d204 Gerrit-Change-Number: 13079 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 12: (184 comments) http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py File shell/ext-py/bitarray-0.9.0/bitarray/__init__.py: http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS12, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray._sysinfo' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS12, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bits2bytes' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS12, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bitdiff' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py File shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py: http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@21 PS12, Line 21: from bitarray import bitarray, bitdiff, bits2bytes, __version__ flake8: E402 module level import not at top of file http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@84 PS12, Line 84: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@85 PS12, Line 85: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@127 PS12, Line 127: class TestsModuleFunctions(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS12, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS12, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS12, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS12, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS12, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS12, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@176 PS12, Line 176: class CreateObjectTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@241 PS12, Line 241: : flake8: E231 missing whitespace after ':' http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@282 PS12, Line 282: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@309 PS12, Line 309: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@314 PS12, Line 314: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@321 PS12, Line 321: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@331 PS12, Line 331: class ToObjectsTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@360 PS12, Line 360: class MetaDataTests(unittest.TestCase): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@396 PS12, Line 396: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/12/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@404 PS12, Line 404: d flake8: E303 too
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4054/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 22:28:56 + Gerrit-HasComments: No
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12884 to look at the new patch set (#12). Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell Beeswax is still supported via --protocol=beeswax but will be deprecated. Differences are abstracted into ImpalaClient subclasses. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. TODO: * exclude ext-py from flake8 * Add DATE converage * Verify that shell tests are going through right interface by disabling interface Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real0m6.708s user0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real0m7.625s user0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 --- M LICENSE.txt M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M bin/jenkins/critique-gerrit-review.py M bin/rat_exclude_files.txt M common/thrift/ImpalaService.thrift M infra/python/deps/compiled-requirements.txt A shell/ext-py/bitarray-0.9.0/PKG-INFO A shell/ext-py/bitarray-0.9.0/bitarray/__init__.py A shell/ext-py/bitarray-0.9.0/bitarray/_bitarray.c A shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py A shell/ext-py/bitarray-0.9.0/setup.py M shell/impala_client.py M shell/impala_shell.py M shell/impala_shell_config_defaults.py M shell/make_shell_tarball.sh M shell/option_parser.py M shell/thrift_sasl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 29 files changed, 7,214 insertions(+), 448 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/12884/12 -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [acid] Predicate to test if dir must be included.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 ) Change subject: [acid] Predicate to test if dir must be included. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@123 PS2, Line 123: fileStatuses = FileSystemUtil.listStatus(fs, partDir_, recursive_, pathPredicate_); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@563 PS2, Line 563: FileMetadataLoader loader = new FileMetadataLoader(e.getKey(), /*recursive=*/isRecursive, line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@607 PS2, Line 607:* TODO:Sudhanshu: Move this to interface and then use this implementation in BaseTableRef line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@567 PS2, Line 567: RecursingIterator(FileSystem fs, Path startPath, Predicate predicate) throws IOException { line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java File fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java: http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@129 PS2, Line 129: when(fs.listStatusIterator(new Path("/user/hive/warehouse/test/delta_023_023_"))). line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@131 PS2, Line 131: when(fs.listStatusIterator(new Path("/user/hive/warehouse/test/delta_024_024_"))). line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@138 PS2, Line 138: new FileSystemUtil.RecursingIterator(fs, new Path("/user/hive/warehouse/test"), x->true); line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@153 PS2, Line 153: new FileSystemUtil.RecursingIterator(fs, new Path("/user/hive/warehouse/test"), line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c Gerrit-Change-Number: 13091 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Tue, 23 Apr 2019 22:13:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 11: (184 comments) http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py File shell/ext-py/bitarray-0.9.0/bitarray/__init__.py: http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS11, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray._sysinfo' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS11, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bits2bytes' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS11, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bitdiff' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py File shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py: http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@21 PS11, Line 21: from bitarray import bitarray, bitdiff, bits2bytes, __version__ flake8: E402 module level import not at top of file http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@84 PS11, Line 84: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@85 PS11, Line 85: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@127 PS11, Line 127: class TestsModuleFunctions(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS11, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS11, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS11, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS11, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS11, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS11, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@176 PS11, Line 176: class CreateObjectTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@241 PS11, Line 241: : flake8: E231 missing whitespace after ':' http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@282 PS11, Line 282: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@309 PS11, Line 309: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@314 PS11, Line 314: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@321 PS11, Line 321: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@331 PS11, Line 331: class ToObjectsTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@360 PS11, Line 360: class MetaDataTests(unittest.TestCase): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@396 PS11, Line 396: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/11/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@404 PS11, Line 404: d flake8: E303 too
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12884 to look at the new patch set (#11). Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell Beeswax is still supported via --protocol=beeswax but will be deprecated. Differences are abstracted into ImpalaClient subclasses. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. TODO: * exclude ext-py from flake8 * Add DATE converage * Verify that shell tests are going through right interface by disabling interface Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real0m6.708s user0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real0m7.625s user0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 --- M LICENSE.txt M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M bin/jenkins/critique-gerrit-review.py M bin/rat_exclude_files.txt M common/thrift/ImpalaService.thrift M infra/python/deps/compiled-requirements.txt A shell/ext-py/bitarray-0.9.0/PKG-INFO A shell/ext-py/bitarray-0.9.0/bitarray/__init__.py A shell/ext-py/bitarray-0.9.0/bitarray/_bitarray.c A shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py A shell/ext-py/bitarray-0.9.0/setup.py M shell/impala_client.py M shell/impala_shell.py M shell/impala_shell_config_defaults.py M shell/make_shell_tarball.sh M shell/option_parser.py M shell/thrift_sasl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 29 files changed, 7,212 insertions(+), 445 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/12884/11 -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/13083/6/tests/beeswax/impala_beeswax.py@116 PS6, Line 116: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/13083/6/tests/custom_cluster/test_client_ssl.py File tests/custom_cluster/test_client_ssl.py: http://gerrit.cloudera.org:8080/#/c/13083/6/tests/custom_cluster/test_client_ssl.py@221 PS6, Line 221: s flake8: E501 line too long (104 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@33 PS6, Line 33: from util import get_impalad_host_port, get_impalad_host_port, get_shell_cmd flake8: F811 redefinition of unused 'get_impalad_host_port' from line 33 http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@33 PS6, Line 33: from util import get_impalad_host_port, get_impalad_host_port, get_shell_cmd flake8: F401 'util.get_shell_cmd' imported but unused http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@204 PS6, Line 204: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@217 PS6, Line 217: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@519 PS6, Line 519: F flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@642 PS6, Line 642: ' flake8: E131 continuation line unaligned for hanging indent http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_commandline.py@872 PS6, Line 872: u flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_interactive.py@41 PS6, Line 41: from util import (assert_var_substitution, ImpalaShell, get_impalad_host_port, flake8: F401 'util.get_impalad_host_port' imported but unused http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_interactive.py@252 PS6, Line 252: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_interactive.py@451 PS6, Line 451: ) flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/test_shell_interactive.py@463 PS6, Line 463: r flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py@116 PS6, Line 116: def get_impalad_host_port(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py@122 PS6, Line 122: def get_impalad_port(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py@126 PS6, Line 126: def get_shell_cmd(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py@129 PS6, Line 129: h flake8: F841 local variable 'host_port' is assigned to but never used http://gerrit.cloudera.org:8080/#/c/13083/6/tests/shell/util.py@133 PS6, Line 133: def get_open_sessions_metric(vector): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 22:06:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [acid] Predicate to test if dir must be included.
Sudhanshu Arora has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13091 Change subject: [acid] Predicate to test if dir must be included. .. [acid] Predicate to test if dir must be included. The predicate is currently not set anywhere but the plan is to modify HdfsTable and use the validWriteIdList there in the predicate implementation and remove the directories that should not be used. Testing Done: - mvn test -Dtest=RecursingIteratorTest Added RecursingIteratorTest. Could have also used FileMetadataLoaderTest but oh! well Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c --- M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java A fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java A fe/src/test/java/org/apache/impala/planner/ExplainTest.java 8 files changed, 418 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/13091/2 -- To view, visit http://gerrit.cloudera.org:8080/13091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c Gerrit-Change-Number: 13091 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Yongzhi Chen
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Tim Armstrong has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. IMPALA-7290: part 1: clean up shell tests This sets up the tests to be extensible to test shell in both beeswax and HS2 modes. Testing: * Add test dimension containing only beeswax in preparation for HS2 dimension. * Factor out hardcoded ports. * Add tests for formatting of all types and NULL values. * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Use unique_database for tests that create tables. Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/create-tpcds-testcase-files.sh M tests/beeswax/impala_beeswax.py M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/custom_cluster/test_client_ssl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 12 files changed, 751 insertions(+), 599 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13083/6 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4053/ -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 21:58:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 ) Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705 PS4, Line 1705: # IMPALA-8386: test coverage for ORDER BY/LIMIT > The substitutions in other places are for ResultExprs which are from the or To be clear, i didn't think there was a bug in your code, just wanted to make sure we had some coverage of "interesting" cases. -- To view, visit http://gerrit.cloudera.org:8080/12939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa Gerrit-Change-Number: 12939 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 21:50:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 4: (31 comments) Mostly looks good to me. Please feel free to defer some of the comments to a subsequent change to make faster progress with this one. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc@365 PS4, Line 365: "DataCacheHitCount", TUnit::UNIT); Do we have tests for these to make sure they show up in the profiles and work as expected? http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@330 PS1, Line 330: for (int i = 0; i < NUM_THREADS; ++i) { > Prefer to keep this test as stand-alone to make development easier. wfm http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@138 PS4, Line 138: const char* fname_ = "foobar"; Move to the other test constants (TEMP_BUFFER_SIZE etc)? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@226 PS4, Line 226: const int64_t cache_size = 4 * 1024 * 1024; nit: This could now be 4 * FLAGS_data_cache_file_max_size ? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@349 PS4, Line 349: This second part Can they be to separate tests? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@410 PS4, Line 410: beyone nit: typo http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@83 PS4, Line 83: concurrency of one thread That's controlled by --data_cache_write_concurrency, right? Mention here? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@87 PS4, Line 87: /// Future work: Do we plan to look into picking partitions on faster disks with higher probability? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@88 PS4, Line 88: /// - bound number of backing files per partition by consolidating the content of very Should we just delete old files if we have reached a configurable high number, 1000 or so, and take the resulting cache hits instead of compacting? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@145 PS4, Line 145: nit: formatting http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@197 PS4, Line 197: /// to be called in a single threaded environment after all IO threads exited. Should we also delete the files when we close them? There's a distinction in the implementation and it's not clear to me why it's there. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@203 PS4, Line 203: const kudu::Slice& key Should we pass const CacheKey& here and convert it in the implementation? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@224 PS4, Line 224: VerifyFilesSizes nit: VerifyFileSizes http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@72 PS4, Line 72: const char* DataCache::Partition::CACHE_FILE_PREFIX = "impala-cache-file-"; static const char*? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@75 PS4, Line 75: struct DataCache::CacheFile { Should this be a class, given it has a c'tor and d'tor? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@173 PS4, Line 173: make_unique(path, fd); : cache_files_.emplace_back(std::move I think you can merge these two lines, which also reduces the risk that someone accidentally uses the now empty cache_file in a future change. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@187 PS4, Line 187: vector nit: missing include, but we might generally omit this one. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@186 PS4, Line 186: // Delete all existing backing files left over from previous runs. : vector entries; : RETURN_IF_ERROR(FileSystemUtil::Directory::GetEntryNames(path_, , 0, : FileSystemUtil::Directory::EntryType::DIR_ENTRY_REG)); : for (const string& entry : entries) { : if (entry.find_first_of(CACHE_FILE_PREFIX) == 0) { : const string file_path = JoinPathSegments(path_,
[Impala-ASF-CR] [acid] Disallow any operation on full acid table.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13047 ) Change subject: [acid] Disallow any operation on full acid table. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java: http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@80 PS2, Line 80: transactional Can we change these strings to constants? transactional, transactional_properties, insert_only http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@84 PS2, Line 84: tableIsTransactional != null && tableIsTransactional.equalsIgnoreCase("true") nit, probably cleaner if we replace this by Boolean.parseBoolean(tableIsTransactional) -- To view, visit http://gerrit.cloudera.org:8080/13047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac Gerrit-Change-Number: 13047 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 23 Apr 2019 19:42:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [acid] Disallow any operation on full acid table.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13047 ) Change subject: [acid] Disallow any operation on full acid table. .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2863/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac Gerrit-Change-Number: 13047 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 23 Apr 2019 20:03:22 + Gerrit-HasComments: No
[Impala-ASF-CR] [acid] Disallow any operation on full acid table.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13047 ) Change subject: [acid] Disallow any operation on full acid table. .. Patch Set 2: Can you file a JIRA and include it in the commit message? This is a significant enough change that we'd want to have an associated JIRA. It's fine if it's an umbrella JIRA that is associated with multiple commits or whatever the most convenient way to organise it for you is. -- To view, visit http://gerrit.cloudera.org:8080/13047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac Gerrit-Change-Number: 13047 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 23 Apr 2019 20:02:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12797 ) Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2862/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I839d7a3a2f5e1309c33a1f66013ef11628c5dc11 Gerrit-Change-Number: 12797 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 19:25:40 + Gerrit-HasComments: No
[Impala-ASF-CR] [acid] Disallow any operation on full acid table.
Sudhanshu Arora has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/13047 ) Change subject: [acid] Disallow any operation on full acid table. .. [acid] Disallow any operation on full acid table. Added a simple check to disallow any operation on fully acid table. Testing Done: - I still need to figure out how to add a test case for this. - Executed the following sql on mini-cluster >CREATE TABLE tm (a int, b int) TBLPROPERTIES >('transactional'='true', >'transactional_properties'='insert_only'); >select * from tm; >CREATE TABLE tm2 (a int, b int) stored as orc TBLPROPERTIES >('transactional'='true'); >Select * from tm2; The above select failed with Analysis exception. Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac --- M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java 1 file changed, 21 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/13047/2 -- To view, visit http://gerrit.cloudera.org:8080/13047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac Gerrit-Change-Number: 13047 Gerrit-PatchSet: 2 Gerrit-Owner: Sudhanshu Arora Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12797 ) Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@87 PS10, Line 87: Get*Val() dispatches to either : /// a codegen'd function pointer or to an interpreted implementation Get*ValInterpreted() : /// These interpreted functions must be overridden by subclasses of ScalarExpr for every : /// type that they may return. : /// : /// The two main usage patterns for ScalarExpr are: : /// * The codegen'd expressions are called from other codegen'd functions, e.g. from a : /// codegen'd join implementation : /// * Get*Val() is called on the root of each expression subtree by interpreted code. : /// We can optimize for the second usage pattern by filling in the codegen'd function : /// pointer (codegend_compute_fn_) in root of each ScalarExpr tree. Individual callsites : /// can disable this optimisation if it's not needed. Expr subtrees can be evaluated : /// (e.g. by ScalarExprEvaluator::GetConstValue()) but may fail back to a slower : /// interpreted implementation. > These sets of comments seem to fit better after line 107-114. Reading this Done http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@170 PS10, Line 170: bool is_codegen_entry_point, > I wonder how much measurable benefit we get from this. IMHO, this can be an I thought about this because it would have made my life easier - that change was the harder part of the patch. I added some explanation to the commit for my decision to couple the changes. I wasn't comfortable separating them because, at minimum, adding all those extra codegen'd function pointers will regress codegen time a bit, and I think it will regress it significantly for some exprs because of the blow-up in number of entry points. It might also make the code generated worth because of changes in inlining decisions. E.g. before this change a codegen'd ConstantExpr would be an internal function with only have a single IR caller, so will be aggressively inlined. But afterwards it's external so may not be inlined. E.g. if you have "x = 1", then previously there was one function pointer populated because there's only one ScalarFnCall node. But if we add support for function pointers in every node without being selective about it, then we would have 3 entry points. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@288 PS10, Line 288: If 'is_entry_point' is true, this indicates that Get*Val() : /// may be called directly from interpreted code and that we should generate an entry : /// point into the codegen'd code. > Please see comments in scalar-expr.cc. 'is_entry_point' may not need to be Ack http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@349 PS10, Line 349: /// The vast majority of exprs support interpretation, so default to true. > May want to point out that expressions which aren't interpretable should ov I think I can make these protected so that the concepts are internal to this class hierarchy and there isn't any additional surface area. I think it doesn't add complexity within the ScalarExpr subsystem since the concepts were already present implicitly. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.h@375 PS10, Line 375: codegend_compute_fn_ = nullptr; > Please document that this is left as null if this scalar expression is not I added some explanation of when it is populated to avoid the double negative. http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/12797/10/be/src/exprs/scalar-expr.cc@284 PS10, Line 284: bool is_entry_point > May be I didn't follow all the details correctly but it seems to me that all > these plumbing for passing this flag around is only used in > GetCodegendComputeFn(). Yeah that's ultimately what happens, the challenge is that GetCodegendComputeFn() can be called multiple times from different places, and the callers have the information about whether it should be an entry point. > In theory, only the top level expression (i.e. the root of the scalar > expression tree) may require exposing itself with a function pointer after > codegen (i.e. it's an entry point). This assumption is false - GetCodegendComputeFnWrapper() generates a codegen'd function that calls the interpreted path of its child. We need to generate entry points there to avoid a perf cliff
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2861/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 18:07:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4356,IMPALA-7331: codegen all ScalarExprs
Hello Michael Ho, Thomas Marshall, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12797 to look at the new patch set (#11). Change subject: IMPALA-4356,IMPALA-7331: codegen all ScalarExprs .. IMPALA-4356,IMPALA-7331: codegen all ScalarExprs Based on initial draft patch by Pooja Nilangekar. Codegen'd expressions can be executed in two ways - either by being called directly from a fully codegend function, or from interpreted code via a function pointer (previously ScalarFnCall::scalar_fn_wrapper_). This change moves the function pointer from ScalarFnCall to its base class ScalarExpr, so the full expr tree can be codegen'd, not just the ScalarFnCall subtrees. The key refactoring and improvements are: * ScalarExpr::Get*Val() switches between interpreted and the codegen'd function pointer code paths in an inline function, avoiding a virtual function call to ScalarFnCal::Get*Val(). * Boilerplate logic is moved to ScalarExpr::GetCodegendComputeFn(), which calls a virtual function GetCodegenComputeFnImpl(). * ScalarFnCall's logic for deciding whether to interpret or codegen is better abstracted and exposed to ScalarExpr as IsInterpretable() and ShouldCodegen() methods. * The ScalarExpr::codegend_compute_fn_ function pointer is only populated for expressions that are "codegen entry points". These include the roots of expr trees and non-root expressions where the parent expression calls Get*Val() from the pseudo-codegend GetCodegendComputeFnWrapper(). * ScalarFnCall is always initialised for interpreted execution. Otherwise the function pointer is needed for non-root expressions, e.g. to support ScalarExprEvaluator::GetConstantVal(). * Latent bugs/gaps for codegen of CollectionVal are fixed. CollectionVal is modified to use the StringVal memory layout to allow code sharing with StringVal. These fixes allowed simplification of IsNotEmptyPredicate codegen (from IMPALA-7657). I chose to tackle two problems in one change - adding support for generating codegen'd function pointers for all ScalarExprs, and adding the "entry point" concept - to avoid a blow-up in the number of codegen'd entry points that could lead to longer codegen times and/or worse code because of inlining changes. IMPALA-7331 (CHAR codegen support functions) is also fixed because it was simpler to enable CHAR codegen within ScalarExpr than to carry forward the exiting CHAR workarounds from ScalarFnCall. The CHAR-specific codegen support required in the scalar expr subsystem is very limited. StringVal intermediates are used everywhere. Only SlotRef actually operates on the different tuple layout, and the required codegen support for SlotRef already exists for UDA intermediates anyway. Testing: * Ran exhaustive tests. Perf: * Ran a basic insert benchmark, which went from 10.1s to 7.6s create table foo stored as parquet as select case when l_orderkey % 2 = 0 then 'aaa' else 'bbb' end from tpch30_parquet.lineitem; * Ran a basic CHAR expr test: set num_nodes=1; set mt_dop=1; select count(*) from lineitem where cast(l_linestatus as CHAR(2)) = 'O ' and cast(l_returnflag as CHAR(2)) = 'N ' The time spent in the scan went from 520ms to 220ms. * Added perf regression test to tpcds-insert, similar to the manual benchmark. * Ran single-node TPC-H with large and small scale factors, to estimate impact on execution perf and query startup time, respectively. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(30) | parquet / none / none | 6.84| -0.18% | 4.49 | -0.31% | +--+---+-++++ +--+--+---++-++---++---++-++ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +--+--+---++-++---++---++-++ | TPCH(30) | TPCH-Q20 | parquet / none / none | 2.58 | 2.47| +4.18% | 1.29% | 0.88%| 5 | +4.12% | 2.31| 5.81 | | TPCH(30) | TPCH-Q17 | parquet / none / none | 4.81 | 4.61| +4.33% | 2.18% | 2.15%| 5 | +3.91% | 1.73| 3.09 | | TPCH(30) | TPCH-Q21 | parquet / none / none | 26.45 | 26.16 | +1.09% | 0.37% | 0.50%| 5 | +1.36% | 2.02| 3.94 | | TPCH(30) | TPCH-Q9
[Impala-ASF-CR] IMPALA-8446: Create a unit test for Admission Controller.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/13078 ) Change subject: IMPALA-8446: Create a unit test for Admission Controller. .. Patch Set 2: (4 comments) looks good, just a few nits http://gerrit.cloudera.org:8080/#/c/13078/2/be/src/scheduling/admission-controller-test.cc File be/src/scheduling/admission-controller-test.cc: http://gerrit.cloudera.org:8080/#/c/13078/2/be/src/scheduling/admission-controller-test.cc@31 PS2, Line 31: CURRENT_EXECUTABLE_PATH when does the SASL lib use this? http://gerrit.cloudera.org:8080/#/c/13078/2/be/src/scheduling/admission-controller-test.cc@139 PS2, Line 139: // Test that the pool configurations can be read correctly. : CheckPoolConfig(request_pool_service, "non-existent queue", 5, -1, 3, true); : CheckPoolConfig(request_pool_service, QUEUE_A, 1, 10L * 1024L * 1024L, 50, true); : CheckPoolConfig(request_pool_service, QUEUE_B, 5, -1, 60, true); : CheckPoolConfig(request_pool_service, QUEUE_C, 10, 128L * 1024L * 1024L, 3, true); this can be a separate test. http://gerrit.cloudera.org:8080/#/c/13078/2/be/src/scheduling/admission-controller-test.cc@182 PS2, Line 182: ASSERT_EQ(3, admission_controller.host_mem_reserved_.size()); : ASSERT_EQ(6000, admission_controller.host_mem_reserved_[HOST_1]); : ASSERT_EQ(5000, admission_controller.host_mem_reserved_[HOST_2]); nit: should we add the same set of checks before the call to UpdatePoolStats to make sure its empty? or will that be an overkill? http://gerrit.cloudera.org:8080/#/c/13078/2/be/src/scheduling/admission-controller-test.cc@188 PS2, Line 188: lock_guard lock(admission_controller.admission_ctrl_lock_); actually other variables like host_mem_reserved_ and the function calls like CanAdmitRequest are all protected by this lock. Since this is a single thread test and all calls are synchronous, maybe we can just ignore holding the lock OR hold it everytime we use something that should be protected by it. -- To view, visit http://gerrit.cloudera.org:8080/13078 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a840590b868f2df1a06f3f397b7b0fc2b29462c Gerrit-Change-Number: 13078 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 18:21:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2860/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 18:03:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 3: (1 comment) Nice work. When I did the transitional work between Hive 1 and Hive 2, I introduced a variable that switched at build time between the two worlds. (See a203733fac3e1e37df8abeee39a88d187153a8c5 for the revert and "git log --grep IMPALA_MINICLUSTER_PROFILE") If I'm understanding right, the approach here is to produce a single "binary" that works for both worlds? Or at run time do the "original" Hive jars get run? I think both approaches are plausible; just want to make sure we're clear about it. (Is the shading slow? I've seen maven-shade-plugin be very slow...) http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@167 PS3, Line 167: export CDP_BUILD_NUMBER=994 This number looks like a CDH_BUILD_NUMBER, and is probably from the same name space, but looks like something you hand-wrote. You can ask the same oracle that CDH_BUILD_NUMBER uses for your own number which is guaranteed not to conflict in the future. -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 17:10:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13083 ) Change subject: IMPALA-7290: part 1: clean up shell tests .. Patch Set 5: (18 comments) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/13083/5/tests/beeswax/impala_beeswax.py@116 PS5, Line 116: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/13083/5/tests/custom_cluster/test_client_ssl.py File tests/custom_cluster/test_client_ssl.py: http://gerrit.cloudera.org:8080/#/c/13083/5/tests/custom_cluster/test_client_ssl.py@221 PS5, Line 221: s flake8: E501 line too long (104 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@33 PS5, Line 33: from util import get_impalad_host_port, get_impalad_host_port, get_shell_cmd flake8: F811 redefinition of unused 'get_impalad_host_port' from line 33 http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@33 PS5, Line 33: from util import get_impalad_host_port, get_impalad_host_port, get_shell_cmd flake8: F401 'util.get_shell_cmd' imported but unused http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@204 PS5, Line 204: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@217 PS5, Line 217: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@519 PS5, Line 519: F flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@642 PS5, Line 642: ' flake8: E131 continuation line unaligned for hanging indent http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_commandline.py@872 PS5, Line 872: u flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_interactive.py@41 PS5, Line 41: from util import (assert_var_substitution, ImpalaShell, get_impalad_host_port, flake8: F401 'util.get_impalad_host_port' imported but unused http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_interactive.py@252 PS5, Line 252: flake8: E203 whitespace before ',' http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_interactive.py@451 PS5, Line 451: ) flake8: E501 line too long (91 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/test_shell_interactive.py@463 PS5, Line 463: r flake8: E501 line too long (92 > 90 characters) http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py@116 PS5, Line 116: def get_impalad_host_port(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py@122 PS5, Line 122: def get_impalad_port(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py@126 PS5, Line 126: def get_shell_cmd(vector): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py@129 PS5, Line 129: h flake8: F841 local variable 'host_port' is assigned to but never used http://gerrit.cloudera.org:8080/#/c/13083/5/tests/shell/util.py@133 PS5, Line 133: def get_open_sessions_metric(vector): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 23 Apr 2019 17:23:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7290: part 1: clean up shell tests
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13083 Change subject: IMPALA-7290: part 1: clean up shell tests .. IMPALA-7290: part 1: clean up shell tests This sets up the tests to be more extended to test shell in both beeswax and HS2 modes. Testing: * Add test dimension containing only beeswax in preparation for HS2 dimension. * Factor out hardcoded ports * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. * Use unique_database for tests that create tables Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/create-tpcds-testcase-files.sh M tests/beeswax/impala_beeswax.py M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/custom_cluster/test_client_ssl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 12 files changed, 751 insertions(+), 599 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13083/5 -- To view, visit http://gerrit.cloudera.org:8080/13083 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibe5ab7f4817e690b7d3be08d71f8f14364b84412 Gerrit-Change-Number: 13083 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. Patch Set 10: (211 comments) http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py File shell/ext-py/bitarray-0.9.0/bitarray/__init__.py: http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS10, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray._sysinfo' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS10, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bits2bytes' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/__init__.py@11 PS10, Line 11: from bitarray._bitarray import _bitarray, bitdiff, bits2bytes, _sysinfo flake8: F401 'bitarray._bitarray.bitdiff' imported but unused http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py File shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py: http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@21 PS10, Line 21: from bitarray import bitarray, bitdiff, bits2bytes, __version__ flake8: E402 module level import not at top of file http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@84 PS10, Line 84: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@85 PS10, Line 85: flake8: E221 multiple spaces before operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@127 PS10, Line 127: class TestsModuleFunctions(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS10, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS10, Line 167: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS10, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@167 PS10, Line 167: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS10, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@168 PS10, Line 168: - flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@176 PS10, Line 176: class CreateObjectTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@241 PS10, Line 241: : flake8: E231 missing whitespace after ':' http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@282 PS10, Line 282: flake8: E261 at least two spaces before inline comment http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@309 PS10, Line 309: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@314 PS10, Line 314: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@321 PS10, Line 321: + flake8: E226 missing whitespace around arithmetic operator http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@331 PS10, Line 331: class ToObjectsTests(unittest.TestCase, Util): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@360 PS10, Line 360: class MetaDataTests(unittest.TestCase): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@396 PS10, Line 396: d flake8: E303 too many blank lines (2) http://gerrit.cloudera.org:8080/#/c/12884/10/shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py@404 PS10, Line 404: d flake8: E303 too
[Impala-ASF-CR] PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell
Tim Armstrong has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12884 ) Change subject: PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell .. PREVIEW: IMPALA-7290: part 2: Add HS2 support to Impala shell Beeswax is still supported via --protocol=beeswax but will be deprecated. Differences are abstracted into ImpalaClient subclasses. This support requires Impala-specific extensions to the HS2 interface, similar to the existing extensions to Beeswax. Thus the HS2 shell is only forwards-compatible with newer Impala versions. I considered trying to gracefully degrade when the new extensions weren't present, but it didn't seem to be worth the ongoing testing effort. Here are the changes required to make it work: * Switch to TBinaryProtocolAccelerated to avoid perf regression. The HS2 protocol requires decoding more primitive values (because its not a string-per-row), which was slow with the pure python implementation of TBinaryProtocol. * Added bitarray module to efficiently unpack null indicators * Minimise invasiveness of changes by transposing and stringifying the columnar results into rows in impala_client.py. The transposition needs to happen before display anyway. * Add PingImpalaHS2Service() to get back version string and webserver address. * Add CloseImpalaOperation() extension to return DML row counts. * Add is_closed member to query handles to avoid shell independently tracking whether the query handle was closed or not. * Include query status in HS2 log to match beeswax. * HS2 GetLog() command now includes query status error message for consistency with beeswax. * "set"/"set all" uses the client requests options, not the session default. This captures the effective value of TIMEZONE, which was previously missing. * "set all" on the server side returns REMOVED query options - the shell needs to know these so it can correctly ignore them. * Clean up self.orig_cmd/self.last_leading comment argument passing to avoid implicit parameter passing through multiple function calls. * Clean up argument handling in shell tests to consistently pass around lists of arguments instead of strings that are subject to shell tokenisation rules. Testing: * Shell tests can run with both protocols * Add tests for formatting of all types and NULL values * Added testing for floating point output formatting, which does change as a result of switching to server-side vs client-side formatting. TODO: * Add DATE converage * Verify that shell tests are going through right interface by disabling interface Performance: Baseline from beeswax shell for large extract is as follows: $ time impala-shell.sh -B -q 'select * from tpch_parquet.orders' > /dev/null real0m6.708s user0m5.132s sys 0m0.204s After this change it is somewhat slower, but we generally don't consider bulk extract performance through the shell to be perf-critical: real0m7.625s user0m6.436s sys 0m0.256s Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 --- M LICENSE.txt M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M bin/rat_exclude_files.txt M common/thrift/ImpalaService.thrift M infra/python/deps/compiled-requirements.txt A shell/ext-py/bitarray-0.9.0/PKG-INFO A shell/ext-py/bitarray-0.9.0/bitarray/__init__.py A shell/ext-py/bitarray-0.9.0/bitarray/_bitarray.c A shell/ext-py/bitarray-0.9.0/bitarray/test_bitarray.py A shell/ext-py/bitarray-0.9.0/setup.py M shell/impala_client.py M shell/impala_shell.py M shell/impala_shell_config_defaults.py M shell/make_shell_tarball.sh M shell/option_parser.py M shell/thrift_sasl.py M tests/custom_cluster/test_shell_interactive.py M tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_commandline.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 28 files changed, 7,217 insertions(+), 442 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/12884/10 -- To view, visit http://gerrit.cloudera.org:8080/12884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d5cc83d545aacc659523f29b1d6feed672e2a12 Gerrit-Change-Number: 12884 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@172 PS3, Line 172: messageFactory nit: maybe messageDeserializer would be a better name now -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 16:46:42 + Gerrit-HasComments: Yes
[native-toolchain-CR] Add missing patch for orc 1.5.5
Lars Volker has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13082 ) Change subject: Add missing patch for orc 1.5.5 .. Add missing patch for orc 1.5.5 This change adds a patch that I had forgotten to add to this review: https://gerrit.cloudera.org/#/c/12998/ I tested this locally by building the toolchain. Change-Id: Iff84c883c50efdc6b438f5428e947ff0d0960ff8 Reviewed-on: http://gerrit.cloudera.org:8080/13082 Reviewed-by: Tim Armstrong Tested-by: Lars Volker --- A source/orc/orc-1.5.5-patches/0001-ORC-396-also-look-for-LZ4-libs-in-lib64-subdir.patch 1 file changed, 25 insertions(+), 0 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Lars Volker: Verified -- To view, visit http://gerrit.cloudera.org:8080/13082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iff84c883c50efdc6b438f5428e947ff0d0960ff8 Gerrit-Change-Number: 13082 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] Add missing patch for orc 1.5.5
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13082 ) Change subject: Add missing patch for orc 1.5.5 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff84c883c50efdc6b438f5428e947ff0d0960ff8 Gerrit-Change-Number: 13082 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 14:59:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 ) Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2859/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa Gerrit-Change-Number: 12939 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 14:57:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12939 to look at the new patch set (#5). Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity .. IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity When generating single node plans for inline views, Impala will create some equivalence conjuncts based on slot equivalences. However, these conjuncts may finally be substituted to identity (e.g. a = a) which may incorrectly reject rows with nulls. We already have some logic to remove this kind of conjuncts but the existing checks have exceptions. For example, consider the following tables and a query: table Atable Btable C +--+ +--++ +--+--+ | a_id | | b_id | amount | | a_id | b_id | +--+ +--++ +--+--+ | 1| | 1| 10 | | 1| 1| | 2| | 1| 20 | | 2| 2| +--+ | 2| NULL | +--+--+ +--++ select * from (select t2.a_id, t2.amount1, t2.amount2 from a left outer join ( select c.a_id, amount as amount1, amount as amount2 from b join c on b.b_id = c.b_id ) t2 on a.a_id = t2.a_id ) t1; They query has 11 slots. The valueTransferGraph (slot equivalences) has 3 strongly connected components: * {slot0 (b.b_id), slot1 (c.b_id)} * {slot2 (c.a_id), slot4 (t2.a_id), slot8 (t1.a_id)} * {slot3 (b.amount), slot5 (t2.amount1), slot6 (t2.amount2), slot9 (t1.amount1), slot10 (t1.amount2)} In SingleNodePlanner#migrateConjunctsToInlineView, when dealing with inline view t1, a predicate "t1.amount1 = t1.amount2" will first be created by Analyzer#createEquivConjuncts, then be substituted using the smap_ of the inline view and become "t2.amount1 = t2.amount2". It can still pass the IdentityPredicate check. However, the substituted one will finally be resolved to "amount = amount" and be assigned to the left outer join node. So nulls are incorrectly rejected. Actually, when checking IdentityPredicates, we need to check the final resolved version of them using base table slots (baseTblSmap_). So the predicate "t1.amount1 = t1.amount2" will be resolved to "amount = amount" and won't pass the IdentityPredicate check. Tests: * Add plan tests in PlannerTest/inline-view.test * Run all tests locally in CORE exploration strategy Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test 6 files changed, 410 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12939/5 -- To view, visit http://gerrit.cloudera.org:8080/12939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa Gerrit-Change-Number: 12939 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12939 ) Change subject: IMPALA-8386: Fix incorrect equivalence conjuncts not treated as identity .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12939/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1190 PS4, Line 1190: Expr expr = e.trySubstitute(inlineViewRef.getBaseTblSmap(), analyzer, false); > Yeah if we could add such a precondition at an appropriate place it would b I add a check after initializing the baseTblSmap_ of inline views. http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/12939/4/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1705 PS4, Line 1705: > Sure. Let me find out these kind of cases to reproduce the bug. The substitutions in other places are for ResultExprs which are from the original query. They're not eq predicates that created by the Analyzer. So I think no problems there. I still add some tests to cover these cases but they can't reproduce the bug. -- To view, visit http://gerrit.cloudera.org:8080/12939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia87aa9db2de85f0716e4854a88727aad593773fa Gerrit-Change-Number: 12939 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 14:23:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2858/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 17 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 14:02:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2857/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 13:57:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form -MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is -01-01 to -12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as -MM-dd HH:mm:ss.S. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as -MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 13:33:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Hello Michael Ho, Lars Volker, Pooja Nilangekar, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12065 to look at the new patch set (#17). Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. IMPALA-5843: Use page index in Parquet files to skip pages This commit implements page filtering based on the Parquet page index. The read and evaluation of the page index is done by the HdfsParquetScanner. At first, we determine the row ranges we are interested in, and based on the row ranges we determine the candidate pages for each column that we are reading. We still issue one ScanRange per column chunk, but we specify sub-ranges that store the candidate pages, i.e. we don't read the whole column chunk, but only fractions of it. Pages are not aligned across column chunks, i.e. page #2 of column A might store completely different rows than page #2 of column B. It means we need to implement some kind of row-skipping logic when we read the data pages. This logic is implemented in BaseScalarColumnReader and ScalarColumnReader. Collection column readers know nothing about page filtering. Page filtering can be turned off by setting the query option 'read_parquet_page_index' to false. Testing: * added some unit tests for the row range and page selection logic * generated various Parquet files with Parquet-MR * enabled Page index writing and wrote selective queries against tables written by Impala. Current tests are likely to use page filtering transparently. Performance: * Measured locally, observed 3x to 20x speedup for selective queries. The speedup was proportional to the IO operations need to be done. * The TPCH benchmark didn't show a significant performance change. It is not a suprise since the data is not being sorted in any useful way. So the main goal was to not introduce perf regression. TODO: * measure performance for remote reads Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a --- M be/src/common/global-flags.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h A be/src/exec/parquet/parquet-common-test.cc M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-level-decoder.h A be/src/exec/parquet/parquet-page-index-test.cc A be/src/exec/parquet/parquet-page-index.cc A be/src/exec/parquet/parquet-page-index.h M be/src/exprs/literal.cc M be/src/runtime/scoped-buffer.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M testdata/data/README A testdata/data/alltypes_tiny_pages.parquet A testdata/data/alltypes_tiny_pages_plain.parquet A testdata/data/decimals_1_10.parquet A testdata/data/double_nested_decimals.parquet A testdata/data/nested_decimals.parquet A testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M tests/query_test/test_parquet_stats.py 36 files changed, 3,376 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/12065/17 -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 17 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc@1170 PS15, Line 1170: // Scalar readers that read collections can be ahead of other readers by 1. > :( Invented BaseScalarColumnReader::LastProcessedRow(), with that I could simplify this check back to the original complexity. http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h File be/src/exec/parquet/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h@533 PS15, Line 533: isn't anymore > pedantry: aren't any more Done http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc@512 PS7, Line 512: if (IsLastCandidateRange()) { > Yeah exactly, the bit-packed runs in Parquet's RLE encoding can have spurio Since then I moved these conditions to ConsumedCurrentCandidateRange() and removed 'num_buffered_values != 0' because it prevented jumping to the next page. It wouldn't produce wrong result sets, but it caused problems in HdfsParquetScanner::CheckPageFiltering() because the 'current rows' could get out-of-sync for a moment. I guard against invalid peeking by moving the num_buffered_values check just before the PeekLevel() == 0 condition. -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 13:16:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Hello Michael Ho, Lars Volker, Pooja Nilangekar, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12065 to look at the new patch set (#16). Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. IMPALA-5843: Use page index in Parquet files to skip pages This commit implements page filtering based on the Parquet page index. The read and evaluation of the page index is done by the HdfsParquetScanner. At first, we determine the row ranges we are interested in, and based on the row ranges we determine the candidate pages for each column that we are reading. We still issue one ScanRange per column chunk, but we specify sub-ranges that store the candidate pages, i.e. we don't read the whole column chunk, but only fractions of it. Pages are not aligned across column chunks, i.e. page #2 of column A might store completely different rows than page #2 of column B. It means we need to implement some kind of row-skipping logic when we read the data pages. This logic is implemented in BaseScalarColumnReader and ScalarColumnReader. Collection column readers know nothing about page filtering. Page filtering can be turned off by setting the query option 'read_parquet_page_index' to false. Testing: * added some unit tests for the row range and page selection logic * generated various Parquet files with Parquet-MR * enabled Page index writing and wrote selective queries against tables written by Impala. Current tests are likely to use page filtering transparently. Performance: * Measured locally, observed 3x to 20x speedup for selective queries. The speedup was proportional to the IO operations need to be done. * The TPCH benchmark didn't show a significant performance change. It is not a suprise since the data is not being sorted in any useful way. So the main goal was to not introduce perf regression. TODO: * measure performance for remote reads Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a --- M be/src/common/global-flags.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h A be/src/exec/parquet/parquet-common-test.cc M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-level-decoder.h A be/src/exec/parquet/parquet-page-index-test.cc A be/src/exec/parquet/parquet-page-index.cc A be/src/exec/parquet/parquet-page-index.h M be/src/exprs/literal.cc M be/src/runtime/scoped-buffer.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M testdata/data/README A testdata/data/alltypes_tiny_pages.parquet A testdata/data/alltypes_tiny_pages_plain.parquet A testdata/data/decimals_1_10.parquet A testdata/data/double_nested_decimals.parquet A testdata/data/nested_decimals.parquet A testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M tests/query_test/test_parquet_stats.py 36 files changed, 3,392 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/12065/16 -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG@30 PS14, Line 30: Testing > Do you want to open a Jira for that? Created IMPALA-8441 -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 12:54:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4052/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 08:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12481 ) Change subject: IMPALA-7368: Add initial support for DATE type .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12481 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Gerrit-Change-Number: 12481 Gerrit-PatchSet: 24 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 23 Apr 2019 08:16:40 + Gerrit-HasComments: No
[native-toolchain-CR] Add missing patch for orc 1.5.5
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13082 ) Change subject: Add missing patch for orc 1.5.5 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff84c883c50efdc6b438f5428e947ff0d0960ff8 Gerrit-Change-Number: 13082 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 23 Apr 2019 06:10:09 + Gerrit-HasComments: No