[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15497 ) Change subject: IMPALA-8005: Randomize partitioning exchanges. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc@742 PS3, Line 742: ASSERT_EQ(result, true); > Agreed, we should have a more deterministic way to test this. Any thoughts? It depends on what the core purpose of the test is. I am wondering if it is really necessary to check if the receiver is different for a particular row across 2 queries, since this depends on the result of (hash_value % num_receivers). Since num_receivers is constant across 2 queries, what if we simplified the test to only check if the hash value is different for a row on the sender side. Given a random seed, the hash value for some number (say a BIGINT number) should in high likelihood be different across 2 queries. In the rare case it is not, you could just do the check in a loop 5 times. I am open to other suggestions.. -- To view, visit http://gerrit.cloudera.org:8080/15497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Gerrit-Change-Number: 15497 Gerrit-PatchSet: 4 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Sun, 22 Mar 2020 03:33:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9451: Fix test hive text codec interop.py failure in CDP build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15520 ) Change subject: IMPALA-9451: Fix test_hive_text_codec_interop.py failure in CDP build .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5554/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief8e583aae82f548754f41e07efac5d7bca4b930 Gerrit-Change-Number: 15520 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Sun, 22 Mar 2020 02:34:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9451: Fix test hive text codec interop.py failure in CDP build
Xiaomeng Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15520 Change subject: IMPALA-9451: Fix test_hive_text_codec_interop.py failure in CDP build .. IMPALA-9451: Fix test_hive_text_codec_interop.py failure in CDP build In CDP build we use Hive3 which has a bug HIVE-22371 (CTAS puts files in the wrong place). It causes failure of newly added test as CTAS creates empty table. Workaround by explicitly creating an external table when hive version >= 3. Tested: Run this test in newest CDP build using job impala-private-basic-parameterized. Change-Id: Ief8e583aae82f548754f41e07efac5d7bca4b930 --- M tests/custom_cluster/test_hive_text_codec_interop.py 1 file changed, 12 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/15520/4 -- To view, visit http://gerrit.cloudera.org:8080/15520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ief8e583aae82f548754f41e07efac5d7bca4b930 Gerrit-Change-Number: 15520 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9451: Fix test hive text codec interop.py failure in CDP build
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15520 ) Change subject: IMPALA-9451: Fix test_hive_text_codec_interop.py failure in CDP build .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15520/4/tests/custom_cluster/test_hive_text_codec_interop.py File tests/custom_cluster/test_hive_text_codec_interop.py: http://gerrit.cloudera.org:8080/#/c/15520/4/tests/custom_cluster/test_hive_text_codec_interop.py@75 PS4, Line 75: H flake8: F821 undefined name 'HIVE_MAJOR_VERSION' -- To view, visit http://gerrit.cloudera.org:8080/15520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief8e583aae82f548754f41e07efac5d7bca4b930 Gerrit-Change-Number: 15520 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Sun, 22 Mar 2020 01:49:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9451: Fix test hive text codec interop.py failure in CDP build
Xiaomeng Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15520 ) Change subject: IMPALA-9451: Fix test_hive_text_codec_interop.py failure in CDP build .. Patch Set 4: The jenkins job with test pass https://master-02.jenkins.cloudera.com/view/Impala/view/Private/job/impala-private-basic-parameterized/189/ -- To view, visit http://gerrit.cloudera.org:8080/15520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief8e583aae82f548754f41e07efac5d7bca4b930 Gerrit-Change-Number: 15520 Gerrit-PatchSet: 4 Gerrit-Owner: Xiaomeng Zhang Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Xiaomeng Zhang Gerrit-Comment-Date: Sun, 22 Mar 2020 01:49:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15497 ) Change subject: IMPALA-8005: Randomize partitioning exchanges. .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5553/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Gerrit-Change-Number: 15497 Gerrit-PatchSet: 4 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Sun, 22 Mar 2020 00:42:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/15497 ) Change subject: IMPALA-8005: Randomize partitioning exchanges. .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc@514 PS3, Line 514: JoinReceivers(); > nit: add a newline between this and next method Done http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc@725 PS3, Line 725: int num_receivers = 4; > Seems like this code block can be consolidated with the identical block on Done. http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/data-stream-test.cc@742 PS3, Line 742: ASSERT_EQ(result, true); > Is the non-match guaranteed though ? Given that there are only 4 receivers, Agreed, we should have a more deterministic way to test this. Any thoughts? http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/15497/3/be/src/runtime/krpc-data-stream-sender.cc@88 PS3, Line 88: exchange_hash_seed_ = > Suggest keeping the constant factor as a static constexpr in the header fil Done -- To view, visit http://gerrit.cloudera.org:8080/15497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Gerrit-Change-Number: 15497 Gerrit-PatchSet: 4 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Sat, 21 Mar 2020 23:55:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.
Anurag Mantripragada has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/15497 ) Change subject: IMPALA-8005: Randomize partitioning exchanges. .. IMPALA-8005: Randomize partitioning exchanges. Currently, we use the same hash seed for partitioning exchanges at the sender. For a table with skew in distribution in the shuffling keys, multiple queries using the same shuffling keys for exchanges will end up hashing to the same destination fragments running on a particular host and potentially overloading that host. This patch seeds the hash with query id. This will ensure that the partitioning exchanges do not always hash to the same destination with same shuffling keys. Testing: Added a test to data-stream-test to verify the data values at destination are different for different queries. Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 --- M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h 3 files changed, 94 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15497/4 -- To view, visit http://gerrit.cloudera.org:8080/15497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468 Gerrit-Change-Number: 15497 Gerrit-PatchSet: 4 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9183: Convert disjunctive predicates to conjunctive normal form
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15462 ) Change subject: IMPALA-9183: Convert disjunctive predicates to conjunctive normal form .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5511/ -- To view, visit http://gerrit.cloudera.org:8080/15462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072 Gerrit-Change-Number: 15462 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 22:14:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Hello Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15454 to look at the new patch set (#8). Change subject: IMPALA-3766: optionally compress spilled data .. IMPALA-3766: optionally compress spilled data Enabled via --disk_spill_compression_codec, which uses the same syntax as the compression_codec query option. Recommended codecs are LZ4 and ZSTD. ZSTD supports specifying a compression level. The compression is done in TmpFileMgr using a temporary buffer. Allocation of disk space is reworked slightly so that the allocation can happen after compression. The default power-of-two disk block sizes would lead to a lot of internal fragmentation, so a new strategy for free space management, similar to that used in the data cache, can be used with --disk_spill_punch_holes=true. TmpFileMgr will allocate a range of the actual compressed size and punch holes in the file for each range that is no longer needed. UncompressedWriteIoBytes is added to the buffer pool profiles, so that you can see what degree of compression is achieved. Typically I saw ratios of 2-3x for LZ4 and ZSTD (with LZ4 toward the lower end and ZSTD toward the higher end). TODO: * Add compression/decompression timers to profile, similar to encryption Limitations: The management of the compression buffer memory could be improved. Ideally it would be integrated with the buffer pool and use the buffer pool allocator instead of being done "on the side". We would probably want to do this before making this the default, for resource management and performance reasons (doing a malloc() directly does not use the caching supported by the buffer pool). Testing: * Run buffer pool spilling tests with different combinations of the new options. * Extend existing TmpFileMgr tests for file space allocation to run with hole punching enabled. * Switch a couple of spilling tests to use the new option. * Add a metrics test to check for scratch leaks. * Enable the new options by default for end-to-end dockerized tests to get additional coverage. * Add a unit test where allocating compression memory fails, both on the read and write path. TODO: * stress test Perf: I ran this spilling query using an SSD as the scratch disk: set mem_limit=200m; select count(distinct l_partkey) from tpch30_parquet.lineitem; The time taken for the second run of each query was: No compression: 19.59s LZ4: 18.56s ZSTD: 20.59s The peak compression buffer usage after running those queries was 4.02 MB. Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 --- M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/query-options.cc M be/src/util/parse-util.cc M be/src/util/parse-util.h M bin/jenkins/dockerized-impala-run-tests.sh M tests/custom_cluster/test_scratch_disk.py M tests/verifiers/metric_verifier.py 15 files changed, 752 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/15454/8 -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9191 (part 1): Allow Impala to run tests without Sentry
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15505 ) Change subject: IMPALA-9191 (part 1): Allow Impala to run tests without Sentry .. IMPALA-9191 (part 1): Allow Impala to run tests without Sentry This patch adds an environment variable DISABLE_SENTRY to allow Impala to run tests without Sentry. Specifically, we start up Sentry only when $DISABLE_SENTRY does not evaluate to true. The corresponding Sentry FE and E2E tests will also be skipped if $DISABLE_SENTRY is true. Moreover, in this patch we will set DISABLE_SENTRY to true if $USE_CDP_HIVE evaluates to true, allowing one to only test Impala's authorization with Ranger when support for Sentry is dropped after we switch to the CDP Hive. Note that in this patch we also change the way we generate hive-site.xml when $DISABLE_SENTRY is true. To be more precise, when generating hive-site.xml, we do not add the Sentry server as a metastore event listener if $DISABLE_SENTRY is true. Recall that both CDH Hive and CDP Hive would make an RPC to the registered listeners every time after the method of create_database_core() in HiveMetaStore.java is called, which happens when Hive instead of Impala is used to create a database, e.g., when some databases in the TPC-DS data set are created during the execution of create-load-data.sh. Thus the removal of Sentry as an event listener is necessary when $DISABLE_SENTRY is true in that it prevents the HiveMetaStore from keeping connecting to the Sentry server that is not online, which could make create-load-data.sh time out. Testing: Except for two currently known issues of IMPALA-9513 AND IMPALA-9451, verified this patch passes the exhaustive tests in the DEBUG build - when $USE_CDP_HIVE is false, and - when $USE_CDP_HIVE is true. Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Reviewed-on: http://gerrit.cloudera.org:8080/15505 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M bin/impala-config.sh M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java M fe/src/test/java/org/apache/impala/authorization/sentry/SentryProxyTest.java M fe/src/test/resources/hive-site.xml.py M testdata/bin/run-all.sh M tests/authorization/test_authorization.py M tests/authorization/test_authorized_proxy.py M tests/authorization/test_grant_revoke.py M tests/authorization/test_owner_privileges.py M tests/authorization/test_sentry.py M tests/authorization/test_show_grant.py M tests/common/skip.py 13 files changed, 61 insertions(+), 17 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Gerrit-Change-Number: 15505 Gerrit-PatchSet: 7 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9191 (part 1): Allow Impala to run tests without Sentry
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15505 ) Change subject: IMPALA-9191 (part 1): Allow Impala to run tests without Sentry .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Gerrit-Change-Number: 15505 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 21 Mar 2020 20:14:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15514 ) Change subject: IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Gerrit-Change-Number: 15514 Gerrit-PatchSet: 4 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 19:52:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15514 ) Change subject: IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports .. IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports A few built-ins were changed in python 3 -- e.g., xrange became range, ConfigParser became configparser, etc. We can redefine some of those things in a single place, and import them from there as needed. Other items may also be added as we go along. Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Reviewed-on: http://gerrit.cloudera.org:8080/15514 Reviewed-by: David Knupp Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M infra/python/deps/compiled-requirements.txt A shell/compatibility.py M shell/impala_client.py M shell/impala_shell.py M shell/make_shell_tarball.sh M shell/option_parser.py M shell/packaging/make_python_package.sh M shell/shell_output.py 8 files changed, 60 insertions(+), 14 deletions(-) Approvals: David Knupp: Looks good to me, approved Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Gerrit-Change-Number: 15514 Gerrit-PatchSet: 5 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5552/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Sat, 21 Mar 2020 18:19:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 ) Change subject: IMPALA-3766: optionally compress spilled data .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/bufferpool/buffer-pool-test.cc File be/src/runtime/bufferpool/buffer-pool-test.cc: http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/bufferpool/buffer-pool-test.cc@1628 PS7, Line 1628: void BufferPoolTest::TestTmpFileAllocateError(const string& compression, bool punch_holes) { line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr-test.cc@531 PS7, Line 531: int64_t expected_bytes_allocated = punch_holes ? 0 : expected_scratch_bytes_allocated; line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr-test.cc@750 PS7, Line 750: ASSERT_OK(tmp_file_mgr.InitCustom(tmp_dir_specs, false, "", punch_holes, metrics_.get())); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr-test.cc@998 PS7, Line 998: file_group.Read(uncompressed_handle.get(), MemRange(big_tmp.data(), big_tmp.size(; line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/15454/7/be/src/runtime/tmp-file-mgr.cc@780 PS7, Line 780: VLOG(3) << "Write " << tmp_file->path() << " " << file_offset << " " << buffer_to_write.len(); line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Sat, 21 Mar 2020 17:35:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data
Hello Sahil Takiar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15454 to look at the new patch set (#7). Change subject: IMPALA-3766: optionally compress spilled data .. IMPALA-3766: optionally compress spilled data Enabled via --disk_spill_compression_codec, which uses the same syntax as the compression_codec query option. Recommended codecs are LZ4 and ZSTD. ZSTD supports specifying a compression level. The compression is done in TmpFileMgr using a temporary buffer. Allocation of disk space is reworked slightly so that the allocation can happen after compression. The default power-of-two disk block sizes would lead to a lot of internal fragmentation, so a new strategy for free space management, similar to that used in the data cache, can be used with --disk_spill_punch_holes=true. TmpFileMgr will allocate a range of the actual compressed size and punch holes in the file for each range that is no longer needed. UncompressedWriteIoBytes is added to the buffer pool profiles, so that you can see what degree of compression is achieved. Typically I saw ratios of 2-3x for LZ4 and ZSTD (with LZ4 toward the lower end and ZSTD toward the higher end). Limitations: The management of the compression buffer memory could be improved. Ideally it would be integrated with the buffer pool and use the buffer pool allocator instead of being done "on the side". We would probably want to do this before making this the default, for resource management and performance reasons (doing a malloc() directly does not use the caching supported by the buffer pool). Testing: * Run buffer pool spilling tests with different combinations of the new options. * Extend existing TmpFileMgr tests for file space allocation to run with hole punching enabled. * Switch a couple of spilling tests to use the new option. * Add a metrics test to check for scratch leaks. * Enable the new options by default for end-to-end dockerized tests to get additional coverage. * Add a unit test where allocating compression memory fails, both on the read and write path. TODO: * stress test Perf: I ran this spilling query using an SSD as the scratch disk: set mem_limit=200m; select count(distinct l_partkey) from tpch30_parquet.lineitem; The time taken for the second run of each query was: No compression: 19.59s LZ4: 18.56s ZSTD: 20.59s The peak compression buffer usage after running those queries was 4.02 MB. Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 --- M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/query-options.cc M be/src/util/parse-util.cc M be/src/util/parse-util.h M bin/jenkins/dockerized-impala-run-tests.sh M tests/custom_cluster/test_scratch_disk.py M tests/verifiers/metric_verifier.py 15 files changed, 752 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/15454/7 -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9183: Convert disjunctive predicates to conjunctive normal form
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15462 ) Change subject: IMPALA-9183: Convert disjunctive predicates to conjunctive normal form .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072 Gerrit-Change-Number: 15462 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 17:12:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9183: Convert disjunctive predicates to conjunctive normal form
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15462 ) Change subject: IMPALA-9183: Convert disjunctive predicates to conjunctive normal form .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5511/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072 Gerrit-Change-Number: 15462 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 17:13:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9183: Convert disjunctive predicates to conjunctive normal form
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15462 ) Change subject: IMPALA-9183: Convert disjunctive predicates to conjunctive normal form .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a03cd7239333aaf375416ef5f2b7608fcd4a072 Gerrit-Change-Number: 15462 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 17:13:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15514 ) Change subject: IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5551/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Gerrit-Change-Number: 15514 Gerrit-PatchSet: 4 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 15:34:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9191 (part 1): Allow Impala to run tests without Sentry
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15505 ) Change subject: IMPALA-9191 (part 1): Allow Impala to run tests without Sentry .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Gerrit-Change-Number: 15505 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 21 Mar 2020 15:18:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9191 (part 1): Allow Impala to run tests without Sentry
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15505 ) Change subject: IMPALA-9191 (part 1): Allow Impala to run tests without Sentry .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5510/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Gerrit-Change-Number: 15505 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 21 Mar 2020 15:18:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9191 (part 1): Allow Impala to run tests without Sentry
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15505 ) Change subject: IMPALA-9191 (part 1): Allow Impala to run tests without Sentry .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e Gerrit-Change-Number: 15505 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 21 Mar 2020 15:15:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15514 ) Change subject: IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports .. Patch Set 3: Code-Review+2 Carrying +2 after rebase. -- To view, visit http://gerrit.cloudera.org:8080/15514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Gerrit-Change-Number: 15514 Gerrit-PatchSet: 3 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 14:50:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15514 ) Change subject: IMPALA-3343: Part 3 - Fix py2->3 changes re: libs, built-ins, imports .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5509/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd3d86df524666a98cbfa463756adac48bd1f8a3 Gerrit-Change-Number: 15514 Gerrit-PatchSet: 4 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 14:50:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 2 - Add thrift sasl library to shell/ext py/
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15513 ) Change subject: IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Gerrit-Change-Number: 15513 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 12:25:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 2 - Add thrift sasl library to shell/ext py/
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15513 ) Change subject: IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ .. IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ We've relied on a copied version of thrift_sasl.py, which needs to be updated to be compatible with python 3, so taking this opportunity to add the thrift_sasl 0.4.1 package to ext-py like the other external python libs we use. Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Reviewed-on: http://gerrit.cloudera.org:8080/15513 Reviewed-by: David Knupp Tested-by: Impala Public Jenkins --- M LICENSE.txt M bin/rat_exclude_files.txt M shell/.gitignore A shell/ext-py/thrift_sasl-0.4.1/CHANGELOG.md A shell/ext-py/thrift_sasl-0.4.1/LICENSE A shell/ext-py/thrift_sasl-0.4.1/README.md A shell/ext-py/thrift_sasl-0.4.1/setup.py R shell/ext-py/thrift_sasl-0.4.1/thrift_sasl/__init__.py M shell/make_shell_tarball.sh M shell/packaging/requirements.txt M tests/util/thrift_util.py 11 files changed, 323 insertions(+), 33 deletions(-) Approvals: David Knupp: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Gerrit-Change-Number: 15513 Gerrit-PatchSet: 3 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3343: Part 2 - Add thrift sasl library to shell/ext py/
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15513 ) Change subject: IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5507/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Gerrit-Change-Number: 15513 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 07:22:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 2 - Add thrift sasl library to shell/ext py/
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/15513 ) Change subject: IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ .. Patch Set 2: Code-Review+2 Carrying +2 -- To view, visit http://gerrit.cloudera.org:8080/15513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Gerrit-Change-Number: 15513 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 07:21:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3343: Part 2 - Add thrift sasl library to shell/ext py/
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15513 ) Change subject: IMPALA-3343: Part 2 - Add thrift_sasl library to shell/ext_py/ .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5506/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e66c728883ceb5b3e96bc5fd120d44ab81bbb75 Gerrit-Change-Number: 15513 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Mar 2020 06:19:07 + Gerrit-HasComments: No