[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 30: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 30 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 17:27:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Reviewed-on: http://gerrit.cloudera.org:8080/14080 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,119 insertions(+), 95 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 31 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5363/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 13:00:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 29: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 12:28:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 30: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 30 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 30: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5430/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 30 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 29: Thanks Zoltan, included the header. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 12:16:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,119 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/29 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: I checked the verify job failure. TL;DR: include runtime/timestamp-value.inline.h in tuple-row-compare-test.cc I think it fails because the verify also does an SO build and when the linker creates the executable for tuple-row-compare-test it doesn't find the symbol 'impala::TimestampValue::FromDaysSinceUnixEpoch(long)' in the linked shared objects. When we do a static build the test is linked against a much bigger static library that contains the symbol. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 20 Feb 2020 11:34:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Feb 2020 22:18:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Feb 2020 17:51:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Feb 2020 17:49:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 28: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 28 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 27: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 27 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 20:29:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 27: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 27 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 27: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 27 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: The verification failed due to a the flaky test_exchange_mem_usage_scaling and AuthorizationStmtTest.testSelect. Run an exhaustive test, it passed: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6472/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 08:55:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 19:08:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: The verification failed because the AllTypeTest added too many columns, with more slot size than possible. This resulted in a bitshift overflow when initialising a SlotRef. Added a comment and DCHECK, and removed some not too important columns from the test to prevent this issue from happening. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 14:25:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 25: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 25 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 14:15:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 25: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5222/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 25 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 12:36:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,118 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/25 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 25 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 24: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 24 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 05 Feb 2020 15:00:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 24: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 24 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 24 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 18:17:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:29:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5587/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 11:01:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,128 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 21: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h@190 PS21, Line 190: /// INT_MAX would be 111..111. nit: you could mention null values -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 21 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 31 Jan 2020 14:59:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 21 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 16:37:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,127 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/21 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 21 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 21: (6 comments) http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc File be/src/util/tuple-row-compare-test.cc: http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@42 PS20, Line 42: desc > nit: add underscore suffix Done http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@164 PS20, Line 164: > nit: double have Done http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@167 PS20, Line 167: teComperator(ColumnType(TYPE_BOOLEAN > nit: please add comment about the layout of tuple_row_mem. Done http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@172 PS20, Line 172: This function is responsible for only the char > Don't we need to set both slots as not nulls? As discussed offline, we do not even have to set these, since by default the slots are not nullable. However this pointed out that we do not test nulls, so added a case for the IntIntTest for them. http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@180 PS20, Line 180: memcpy > nit: use DCHECK_EQ instead Done http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h@187 PS20, Line 187: We transform the original a and b values to their "sha > nit: The shared representation has an important property that could be ment Done, copied your description to the comment. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 21 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 15:51:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc File be/src/util/tuple-row-compare-test.cc: http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@42 PS20, Line 42: desc nit: add underscore suffix -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 14:58:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 20: Code-Review+1 (5 comments) Found some nits, but I think it's almost done :) http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc File be/src/util/tuple-row-compare-test.cc: http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@164 PS20, Line 164: have nit: double have http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@167 PS20, Line 167: sizeof(char*) + sizeof(int32_t*) * 2 nit: please add comment about the layout of tuple_row_mem. http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@172 PS20, Line 172: tuple_mem->SetNotNull(NullIndicatorOffset(0,1)); Don't we need to set both slots as not nulls? http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@180 PS20, Line 180: DCHECK nit: use DCHECK_EQ instead http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h@187 PS20, Line 187: The basic concept of getting the shared representation nit: The shared representation has an important property that could be mentioned. Namely that we transform the original a and b values to their "shared representation" a' and b' in a way that if a < b then a' is lexically less than b' regarding to their bits. Thus for ints INT_MIN would be 0, INT_MIN+1 would be 1, and so on, and in the end INT_MAX would be 111..111. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 27 Jan 2020 14:44:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5494/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Jan 2020 17:52:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,109 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/20 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 20: (5 comments) http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319 PS19, Line 319: // The algorithm requires all values having a common type, without loss of data. : // This means we have to find the biggest type. : int max_size = ordering_exprs_[0]->type().GetByteSize(); : for (int i = 1; i < ordering_exprs_.size(); ++i) { > nit: the mask could be calculated in GetSharedRepresentation() instead of p Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423 PS19, Line 423: > Local variable U val shadows patameter void* val. Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424 PS19, Line 424: > nit: please add comment about it, something like "we copy the bytes from th Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434 PS19, Line 434: alue = *reinterpret_cast(val); > It will only have the value of the first char of the string. Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435 PS19, Line 435: tmp, &floating_value, sizeof(T)); > replace with 'sizeof(U) - std::min(sizeof(U), type.len)'? Done -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Jan 2020 17:06:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 19: (5 comments) http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319 PS19, Line 319: // The masks are used for setting the sign bit correctly. : constexpr uint32_t mask32 = (uint32_t)1 << 31; : constexpr uint64_t mask64 = (uint64_t)1 << 63; : constexpr uint128_t mask128 = (uint128_t)1 << 127; nit: the mask could be calculated in GetSharedRepresentation() instead of passing it over http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423 PS19, Line 423: val Local variable U val shadows patameter void* val. http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424 PS19, Line 424: BitUtil::ByteSwap(&val, string_value->ptr, len); nit: please add comment about it, something like "we copy the bytes from the string but swap the bytes because of integer endianess." http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434 PS19, Line 434: static_cast(*reinterpret_cast(val) It will only have the value of the first char of the string. I see there are tests for chars, but do we have tests for fixed size strings, e.g. CHAR(5)? http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435 PS19, Line 435: (sizeof(U) > 8 ? sizeof(U) * 8 - 64 : 0) replace with 'sizeof(U) - std::min(sizeof(U), type.len)'? -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 19 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 14:41:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5480/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 19 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 11:37:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5479/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 18 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 11:35:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/19 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 19 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34 PS14, Line 34: getting great results > Could you provide some basic statistics? Done http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35 PS14, Line 35: One negative is the sorting itself, taking :4-7 more times than lexical sorting. > You could emphasize that it only affects the writes. Done -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 10:52:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% A only drawback is the sorting itself, taking ~4 times more than lexical sorting. Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/18 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 18 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 16 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 10:08:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/16 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 16 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 16: Rebased. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 16 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 09:24:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 14: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34 PS14, Line 34: getting great results Could you provide some basic statistics? http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35 PS14, Line 35: One negative is the sorting itself, taking :4-7 more times than lexical sorting. You could emphasize that it only affects the writes. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 07 Jan 2020 14:27:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5267/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5268/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 13 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,060 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/13 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 13 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@360 PS9, Line 360: msd_lhs = lhsi; : msd_rhs = rhsi; : } : } > This means the column that uses most bits will likely be the dominating col Hi, sorry for replying late. Uploaded a patch set where the smaller types are shifted up, and won't be dominated by the bigger columns. (I do not have a design doc that covered this particular part.) -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 Dec 2019 10:45:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,059 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/14 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5019/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 12 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Nov 2019 09:00:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, but this can be relaxed. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,002 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/12 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 12 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Anonymous Coward (520) has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@360 PS9, Line 360: if (less_msb(msd_lhs ^ msd_rhs, lhsi ^ rhsi)) { : msd_lhs = lhsi; : msd_rhs = rhsi; : } This means the column that uses most bits will likely be the dominating column. e.g. if two columns are selected, one uses 8 bits and the other uses 4 bits, then the column using 8 bits will likely to determine the sorting order. Do you have the design doc covering the detail of all types of data? -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 29 Oct 2019 00:06:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc@275 PS9, Line 275: DEFINE_bool(unlock_zorder_sort, false, : "(Experimental) If true, enables using ZORDER option for SORT BY."); I think we can enable it by default. Or maybe in a follow-up commit, since some tests also need to be moved from custom cluster tests to query tests. http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@325 PS9, Line 325: ((uint128_t) -1) / 2 + 1 nit: how about (uint128_t)1 << 127? Or you could use SetBit from bit-util.h -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 Oct 2019 16:23:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG@12 PS8, Line 12: The commit adds a Comperator based on Z-ordering. See in detail: Nit: comparator. Also on line 15. http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc File be/src/exec/partial-sort-node.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc@54 PS9, Line 54: sorting_order_ = (TSortingOrder::type)tnode.sort_node.sort_info.sorting_order; I think we're trying to avoid C-style casts. http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc@50 PS9, Line 50: sorting_order_ = (TSortingOrder::type)tnode.sort_node.sort_info.sorting_order; I think we're trying to avoid C-style casts. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 21 Oct 2019 11:48:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4829/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 18 Oct 2019 16:18:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comperator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comperator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, but this can be relaxed. Currently, strings, varchars, floats and doubles are not supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h 16 files changed, 784 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/9 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Sep 2019 13:39:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc File be/src/util/tuple-row-compare-test.cc: http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94 PS7, Line 94: Tuple* tuple_mem = Tuple::Create(sizeof(char) + GetSize(args...), &expr_perm_pool_); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@314 PS5, Line 314: > nit: Can we come up with a better name? Maybe GetZDimensionValue() or somet Done http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@334 PS5, Line 334: turn Comp > Maybe you could add a DCHECK(false); as well, and maybe a TODO comment. If Done http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@383 PS5, Line 383: rn > nit: since you use 'lhs' and 'rhs' at other places, maybe rename 'v1' and ' Done http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209 PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* codegen, > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323 PS7, Line 323: constexpr uint64_t mask64 = 0x8000; > line too long (95 > 90) Done http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395 PS7, Line 395: case TYPE_TIMESTAMP: { > line too long (91 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Sep 2019 12:58:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comperator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comperator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, but this can be relaxed. Currently, strings, varchars, floats and doubles are not supported. Testing: * Added unit tests. * Currently, some tests are missing. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h 16 files changed, 776 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/8 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/4553/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 Sep 2019 14:25:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc File be/src/util/tuple-row-compare-test.cc: http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94 PS7, Line 94: uint8_t* tuple_row_mem = expr_perm_pool_.Allocate(sizeof(char*) + sizeof(int32_t*) * 2); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209 PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* codegen, llvm::Function** fn) { line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323 PS7, Line 323: constexpr uint128_t mask128 = ((uint128_t) -1) / 2 + 1; //0x8000; line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395 PS7, Line 395: const uint128_t nanoseconds = static_cast(ts->time().total_nanoseconds()); line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 Sep 2019 13:46:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14080 Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comperator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comperator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, but this can be relaxed. Currently, strings, varchars, floats and doubles are not supported. Testing: * Added unit tests. * Currently, some tests are missing. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h 16 files changed, 630 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/7 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy