[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16045 ) Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6235/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 08 Jun 2020 04:20:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15814 ) Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6234/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 08 Jun 2020 03:48:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16045 to look at the new patch set (#4). Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. IMPALA-9838: Switch to GCC 7.5.0 This upgrades GCC and libstdc++ to version 7.5.0. There have been ABI changes since 4.9.2, so this means that the native-toolchain produced with the new compiler is not interoperable with one produced by the old compiler. To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME is now a subdirectory of IMPALA_TOOLCHAIN (toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish it from the old packages. Some Python packages in the impala-python virtualenv are compiled using the toolchain GCC and now use the new ABI. This leads to two changes: 1. When constructing the LD_LIBRARY_PATH for impala-python, we include the GCC libstdc++ libraries. Otherwise, certain Python packages that use C++ fail on older OSes like Centos 7. This fixes IMPALA-9804. 2. Since developers work on various branches, this changes the virtualenv's directory location to a directory with the GCC version in the name. This allows the virtualenv built with GCC 7 to coexist with the current virtualenv built with GCC 4.9.2. The location for the old virtualenv is ${IMPALA_HOME}/infra/python/env. The new location is ${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This required updating several impala-python scripts. There are various odds-and-ends related to the transition: 1. Due to the small string optimization, the size of std::string changed, which means that various data structures also changed in size. This required updating some static asserts. 2. There is a bug in clang-tidy that reports a use-after-free for some code using std::shared_ptr. Clang is not modeling the shared_ptr correctly, so it is a false-positive. As a workaround, this disables the clang-analyzer-cplusplus.NewDelete diagnostic. 3. Various small compilation fixes (includes, etc). Performance testing: - Ran single-node performance tests on TPC-H for the following configurations: - TPC-H Parquet scale 30 with normal configurations - TPC-H Parquet scale 30 with codegen disabled - TPC-H Kudu scale 10 None found any significant regressions. Full results are posted on the JIRA. - Ran single-node performance tests on targeted-perf scale 10. No significant regressions. - The size of binaries (impalad, etc) is slightly smaller with the new GCC: GCC 4.9.2 release impalad binary: 545664 GCC 7.5.0 release impalad binary: 539900 - Compilation in DEBUG mode is roughly 15-25% faster Functional testing: - Ran core jobs, exhaustive release jobs, UBSAN Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 --- M .clang-tidy M be/src/runtime/sorter-internal.h M be/src/runtime/sorter.cc M be/src/runtime/thread-resource-mgr.cc M be/src/util/container-util.h M bin/impala-config.sh M bin/impala-flake8 M bin/impala-gcovr M bin/impala-ipython M bin/impala-pip M bin/impala-py.test M bin/impala-python M bin/impala-python-common.sh M bin/impala-shell.sh M bin/set-pythonpath.sh M infra/python/bootstrap_virtualenv.py M tests/comparison/ORACLE.txt 17 files changed, 36 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/16045/4 -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16045 ) Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh@217 PS3, Line 217: export IMPALA_TOOLCHAIN_PACKAGES_HOME=${IMPALA_TOOLCHAIN}/toolchain-packages-gcc${IMPALA_GCC_VERSION} > line too long (101 > 90) Done http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py@398 PS3, Line 398: > flake8: E201 whitespace after '[' Done -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 08 Jun 2020 03:35:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15814 ) Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py@215 PS4, Line 215: F > flake8: E122 continuation line missing indentation or outdented Done -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 08 Jun 2020 03:35:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15814 to look at the new patch set (#5). Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. LZO is no longer loaded a plugin. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e --- M CMakeLists.txt M be/src/exec/hdfs-plugin-text-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M bin/bootstrap_system.sh M bin/clean.sh M bin/impala-config.sh M bin/set-ld-library-path.sh M bin/start-impala-cluster.py M buildall.sh M docker/entrypoint.sh M docker/impala_base/Dockerfile M docker/test-with-docker.py M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java D testdata/bad_text_lzo/bad_text.lzo D testdata/bad_text_lzo/bad_text.lzo.index M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/generate-test-vectors.py M testdata/bin/load_nested.py D testdata/bin/lzo_indexer.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test D testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test M testdata/workloads/perf-regression/perf-regression_dimensions.csv M testdata/workloads/perf-regression/perf-regression_exhaustive.csv M testdata/workloads/perf-regression/perf-regression_pairwise.csv M testdata/workloads/targeted-perf/targeted-perf_dimensions.csv M testdata/workloads/targeted-perf/targeted-perf_exhaustive.csv M testdata/workloads/targeted-perf/targeted-perf_pairwise.csv M testdata/workloads/targeted-stress/targeted-stress_dimensions.csv M testdata/workloads/targeted-stress/targeted-stress_exhaustive.csv M testdata/workloads/targeted-stress/targeted-stress_pairwise.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_dimensions.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_exhaustive.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_pairwise.csv M testdata/workloads/tpcds/tpcds_dimensions.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/test_dimensions.py M tests/custom_cluster/test_hive_text_codec_interop.py D tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/query_test/test_compressed_formats.py M tests/query_test/test_scanners_fuzz.py 62 files changed, 85 insertions(+), 333 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/15814/5 -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16045 ) Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6233/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Jun 2020 03:33:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15814 ) Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6232/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Jun 2020 03:32:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16045 ) Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh@217 PS3, Line 217: export IMPALA_TOOLCHAIN_PACKAGES_HOME=${IMPALA_TOOLCHAIN}/toolchain-packages-gcc${IMPALA_GCC_VERSION} line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py@398 PS3, Line 398: flake8: E201 whitespace after '[' -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Jun 2020 02:48:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16045 Change subject: IMPALA-9838: Switch to GCC 7.5.0 .. IMPALA-9838: Switch to GCC 7.5.0 This upgrades GCC and libstdc++ to version 7.5.0. There have been ABI changes since 4.9.2, so this means that the native-toolchain produced with the new compiler is not interoperable with one produced by the old compiler. To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME is now a subdirectory of IMPALA_TOOLCHAIN (toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish it from the old packages. Some Python packages in the impala-python virtualenv are compiled using the toolchain GCC and now use the new ABI. This leads to two changes: 1. When constructing the LD_LIBRARY_PATH for impala-python, we include the GCC libstdc++ libraries. Otherwise, certain Python packages that use C++ fail on older OSes like Centos 7. This fixes IMPALA-9804. 2. Since developers work on various branches, this changes the virtualenv's directory location to a directory with the GCC version in the name. This allows the virtualenv built with GCC 7 to coexist with the current virtualenv built with GCC 4.9.2. The location for the old virtualenv is ${IMPALA_HOME}/infra/python/env. The new location is ${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This required updating several impala-python scripts. There are various odds-and-ends related to the transition: 1. Due to the small string optimization, the size of std::string changed, which means that various data structures also changed in size. This required updating some static asserts. 2. There is a bug in clang-tidy that reports a use-after-free for some code using std::shared_ptr. Clang is not modeling the shared_ptr correctly, so it is a false-positive. As a workaround, this disables the clang-analyzer-cplusplus.NewDelete diagnostic. 3. Various small compilation fixes (includes, etc). Performance testing: - Ran single-node performance tests on TPC-H for the following configurations: - TPC-H Parquet scale 30 with normal configurations - TPC-H Parquet scale 30 with codegen disabled - TPC-H Kudu scale 10 None found any significant regressions. Full results are posted on the JIRA. - Ran single-node performance tests on targeted-perf scale 10. No significant regressions. - The size of binaries (impalad, etc) is slightly smaller with the new GCC: GCC 4.9.2 release impalad binary: 545664 GCC 7.5.0 release impalad binary: 539900 - Compilation in DEBUG mode is roughly 15-25% faster Functional testing: - Ran core jobs, exhaustive release jobs, UBSAN Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 --- M .clang-tidy M be/src/runtime/sorter-internal.h M be/src/runtime/sorter.cc M be/src/runtime/thread-resource-mgr.cc M be/src/util/container-util.h M bin/impala-config.sh M bin/impala-flake8 M bin/impala-gcovr M bin/impala-ipython M bin/impala-pip M bin/impala-py.test M bin/impala-python M bin/impala-python-common.sh M bin/impala-shell.sh M bin/set-pythonpath.sh M infra/python/bootstrap_virtualenv.py M tests/comparison/ORACLE.txt 17 files changed, 35 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/16045/3 -- To view, visit http://gerrit.cloudera.org:8080/16045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Gerrit-Change-Number: 16045 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15814 ) Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py@215 PS4, Line 215: F flake8: E122 continuation line missing indentation or outdented -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Jun 2020 02:48:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15814 Change subject: IMPALA-9709: Remove Impala-lzo from the development environment .. IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. LZO is no longer loaded a plugin. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e --- M CMakeLists.txt M be/src/exec/hdfs-plugin-text-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M bin/bootstrap_system.sh M bin/clean.sh M bin/impala-config.sh M bin/set-ld-library-path.sh M bin/start-impala-cluster.py M buildall.sh M docker/entrypoint.sh M docker/impala_base/Dockerfile M docker/test-with-docker.py M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java D testdata/bad_text_lzo/bad_text.lzo D testdata/bad_text_lzo/bad_text.lzo.index M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/generate-test-vectors.py M testdata/bin/load_nested.py D testdata/bin/lzo_indexer.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test D testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test M testdata/workloads/perf-regression/perf-regression_dimensions.csv M testdata/workloads/perf-regression/perf-regression_exhaustive.csv M testdata/workloads/perf-regression/perf-regression_pairwise.csv M testdata/workloads/targeted-perf/targeted-perf_dimensions.csv M testdata/workloads/targeted-perf/targeted-perf_exhaustive.csv M testdata/workloads/targeted-perf/targeted-perf_pairwise.csv M testdata/workloads/targeted-stress/targeted-stress_dimensions.csv M testdata/workloads/targeted-stress/targeted-stress_exhaustive.csv M testdata/workloads/targeted-stress/targeted-stress_pairwise.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_dimensions.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_exhaustive.csv M testdata/workloads/tpcds-unmodified/tpcds-unmodified_pairwise.csv M testdata/workloads/tpcds/tpcds_dimensions.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/test_dimensions.py M tests/custom_cluster/test_hive_text_codec_interop.py D tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/query_test/test_compressed_formats.py M tests/query_test/test_scanners_fuzz.py 62 files changed, 85 insertions(+), 333 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/15814/4 -- To view, visit http://gerrit.cloudera.org:8080/15814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Gerrit-Change-Number: 15814 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6231/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 08 Jun 2020 02:27:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#11). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_sort.py 11 files changed, 141 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/11 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16046 ) Change subject: IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6230/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16046 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9 Gerrit-Change-Number: 16046 Gerrit-PatchSet: 1 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 07 Jun 2020 22:53:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION
Fang-Yu Rao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16046 Change subject: IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION .. IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION When executing a GRANT or REVOKE statement with Ranger being the authorization provider, Impala has to prepare a GrantRevokeRequest to allow Ranger to add/delete the corresponding RangerPolicy or modify the existing RangerPolicyItem's in the related RangerPolicy. One of the fields that has to be set in a GrantRevokeRequest is delegateAdmin, which dictates whether the grantee is allowed to transfer the privilege on the resource to other principals. Specifically, the field of delegateAdmin in the updated RangerPolicyItem corresponding to the grantee would be set to the value of delegateAdmin in the GrantRevokeRequest prepared by Impala. Before this patch, when executing a REVOKE statement without the GRANT OPTION, Impala would set delegateAdmin in the GrantRevokeRequest to true. This is fine if the privilege to be revoked is the only privilege that was previously granted to the grantee. However, in the case when the privilege to be revoked was not granted and there is a RangerPolicyItem with respect to the other privilege on the same resource, the grantee actually obtains the permission to transfer the non-matching privilege afterwards. The root cause of this issue is that the privileges on the same resource share the same field of delegateAdmin in the corresponding RangerPolicyItem, a current limitation of Ranger. In this regard, as a workaround, we set delegateAdmin in the GrantRevokeRequest to false for a REVOKE statement without the GRANT OPTION. We would like to point out that there is a limitation of this workaround. More precisely, in the case when the grantee was permitted to transfer the non-matching privilege, setting delegateAdmin to false in the GrantRevokeRequest would deprive the grantee of the permission that should not have been revoked, making it a bit inconvenient for both the administrator and the grantee since the permission to transfer the non-matching privilege should be restored afterwards if necessary. An alternative approach is for Impala to always check the current delegateAdmin value when performing a REVOKE statement without the GRANT OPTION. Specifically, we could resolve this problem by 1) checking whether or not there exists a RangerPolicyItem with respect to the same resource and the grantee such that the delegateAdmin field is set to true and 2) setting up the delegateAdmin field in the GrantRevokeRequest accordingly. This alternative, however, suffers from the drawback that additional logic has to be added to iterate over the RangerPolicy's for the resource in the query, slowing down the query execution. Therefore, we decide to choose the approach proposed in this patch over the alternative that is less efficient. Testing: - Revised a test case in test_ranger.py to reflect the behavior change of Impala when a REVOKE statement without the GRANT OPTION is executed. - Verified that this patch passed the exhaustive tests in the DEBUG build. Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9 --- M fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java M tests/authorization/test_ranger.py 2 files changed, 7 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/16046/1 -- To view, visit http://gerrit.cloudera.org:8080/16046 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9 Gerrit-Change-Number: 16046 Gerrit-PatchSet: 1 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Quanlong Huang