[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 30: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 17:27:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Reviewed-on: http://gerrit.cloudera.org:8080/14080
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,119 insertions(+), 95 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 31
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 29:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5363/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 13:00:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 29: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:28:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 30: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 30:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5430/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 30
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:29:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 29:

Thanks Zoltan, included the header.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:16:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,119 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/29
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28:

I checked the verify job failure. TL;DR: include 
runtime/timestamp-value.inline.h in tuple-row-compare-test.cc

I think it fails because the verify also does an SO build and when the linker 
creates the executable for tuple-row-compare-test it doesn't find the symbol 
'impala::TimestampValue::FromDaysSinceUnixEpoch(long)' in the linked shared 
objects. When we do a static build the test is linked against a much bigger 
static library that contains the symbol.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 20 Feb 2020 11:34:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Feb 2020 22:18:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5366/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Feb 2020 17:51:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Feb 2020 17:49:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5363/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 28: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 28
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Feb 2020 11:35:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 27: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 20:29:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 27: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 27:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5356/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 27
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 16:06:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26:

The verification failed due to a the flaky test_exchange_mem_usage_scaling and 
AuthorizationStmtTest.testSelect.
Run an exhaustive test, it passed: 
https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6472/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 08:55:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 19:08:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26:

The verification failed because the AllTypeTest added too many columns, with 
more slot size than possible. This resulted in a bitshift overflow when 
initialising a SlotRef. Added a comment and DCHECK, and removed some not too 
important columns from the test to prevent this issue from happening.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:25:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5340/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:16:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 25: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:15:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 25:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5222/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 12:36:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,118 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/25
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 24: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 05 Feb 2020 15:00:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5285/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 24: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 24
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 05 Feb 2020 10:05:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 23: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 18:17:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 23:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 23: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 23
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 22: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 13:29:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5587/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 11:01:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,128 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-31 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 21: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/14080/21/be/src/util/tuple-row-compare.h@190
PS21, Line 190:   /// INT_MAX would be 111..111.
nit: you could mention null values



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 31 Jan 2020 14:59:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5549/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 30 Jan 2020 16:37:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,127 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/21
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-30 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@42
PS20, Line 42: desc
nit: add underscore suffix



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 30 Jan 2020 14:58:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 20: Code-Review+1

(5 comments)

Found some nits, but I think it's almost done :)

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@164
PS20, Line 164: have
nit: double have


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@167
PS20, Line 167: sizeof(char*) + sizeof(int32_t*) * 2
nit: please add comment about the layout of tuple_row_mem.


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@172
PS20, Line 172: tuple_mem->SetNotNull(NullIndicatorOffset(0,1));
Don't we need to set both slots as not nulls?


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare-test.cc@180
PS20, Line 180: DCHECK
nit: use DCHECK_EQ instead


http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/14080/20/be/src/util/tuple-row-compare.h@187
PS20, Line 187: The basic concept of getting the shared representation
nit: The shared representation has an important property that could be 
mentioned. Namely that we transform the original a and b values to their 
"shared representation" a' and b' in a way that if a < b then a' is lexically 
less than b' regarding to their bits. Thus for ints INT_MIN would be 0, 
INT_MIN+1 would be 1, and so on, and in the end INT_MAX would be 111..111.



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jan 2020 14:44:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5494/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:52:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,109 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/20
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319
PS19, Line 319:   // The algorithm requires all values having a common type, 
without loss of data.
  :   // This means we have to find the biggest type.
  :   int max_size = ordering_exprs_[0]->type().GetByteSize();
  :   for (int i = 1; i < ordering_exprs_.size(); ++i) {
> nit: the mask could be calculated in GetSharedRepresentation() instead of p
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423
PS19, Line 423: 
> Local variable U val shadows patameter void* val.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424
PS19, Line 424:
> nit: please add comment about it, something like "we copy the bytes from th
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434
PS19, Line 434: alue = *reinterpret_cast(val);
> It will only have the value of the first char of the string.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435
PS19, Line 435: tmp, _value, sizeof(T));
> replace with 'sizeof(U) - std::min(sizeof(U), type.len)'?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:06:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 19:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319
PS19, Line 319:   // The masks are used for setting the sign bit correctly.
  :   constexpr uint32_t mask32 = (uint32_t)1 << 31;
  :   constexpr uint64_t mask64 = (uint64_t)1 << 63;
  :   constexpr uint128_t mask128 = (uint128_t)1 << 127;
nit: the mask could be calculated in GetSharedRepresentation() instead of 
passing it over


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423
PS19, Line 423: val
Local variable U val shadows patameter void* val.


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424
PS19, Line 424:   BitUtil::ByteSwap(, string_value->ptr, len);
nit: please add comment about it, something like "we copy the bytes from the 
string but swap the bytes because of integer endianess."


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434
PS19, Line 434: static_cast(*reinterpret_cast(val)
It will only have the value of the first char of the string.

I see there are tests for chars, but do we have tests for fixed size strings, 
e.g. CHAR(5)?


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435
PS19, Line 435: (sizeof(U) > 8 ? sizeof(U) * 8 - 64 : 0)
replace with 'sizeof(U) - std::min(sizeof(U), type.len)'?



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 14:41:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5480/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 11:37:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5479/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 18
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 11:35:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/19
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34
PS14, Line 34: getting great results
> Could you provide some basic statistics?
Done


http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35
PS14, Line 35: One negative is the sorting itself, taking
 :4-7 more times than lexical sorting.
> You could emphasize that it only affects the writes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 10:52:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   A only drawback is the sorting itself, taking ~4 times more
   than lexical sorting. Note however, that this is a one-time
   thing to do, sorting only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/18
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 18
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5478/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 10:08:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/16
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 16:

Rebased.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 09:24:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 14: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34
PS14, Line 34: getting great results
Could you provide some basic statistics?


http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35
PS14, Line 35: One negative is the sorting itself, taking
 :4-7 more times than lexical sorting.
You could emphasize that it only affects the writes.



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jan 2020 14:27:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5267/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5268/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 13
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 Dec 2019 11:14:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,060 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/13
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 13
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,059 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/14
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-11-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5019/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 12
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Nov 2019 09:00:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-11-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,002 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/12
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 12
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-10-28 Thread Anonymous Coward (Code Review)
Anonymous Coward (520) has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@360
PS9, Line 360: if (less_msb(msd_lhs ^ msd_rhs, lhsi ^ rhsi)) {
 :   msd_lhs = lhsi;
 :   msd_rhs = rhsi;
 : }
This means the column that uses most bits will likely be the dominating column. 
e.g. if two columns are selected, one uses 8 bits and the other uses 4 bits, 
then the column using 8 bits will likely to determine the sorting order. Do you 
have the design doc covering the detail of all types of data?



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 29 Oct 2019 00:06:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-10-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/common/global-flags.cc@275
PS9, Line 275: DEFINE_bool(unlock_zorder_sort, false,
 : "(Experimental) If true, enables using ZORDER option for 
SORT BY.");
I think we can enable it by default. Or maybe in a follow-up commit, since some 
tests also need to be moved from custom cluster tests to query tests.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/util/tuple-row-compare.cc@325
PS9, Line 325: ((uint128_t) -1) / 2 + 1
nit: how about (uint128_t)1 << 127? Or you could use SetBit from bit-util.h



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 21 Oct 2019 16:23:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-10-21 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/8//COMMIT_MSG@12
PS8, Line 12: The commit adds a Comperator based on Z-ordering. See in detail:
Nit: comparator. Also on line 15.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc
File be/src/exec/partial-sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/partial-sort-node.cc@54
PS9, Line 54:   sorting_order_ = 
(TSortingOrder::type)tnode.sort_node.sort_info.sorting_order;
I think we're trying to avoid C-style casts.


http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/14080/9/be/src/exec/sort-node.cc@50
PS9, Line 50:   sorting_order_ = 
(TSortingOrder::type)tnode.sort_node.sort_info.sorting_order;
I think we're trying to avoid C-style casts.



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 21 Oct 2019 11:48:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4829/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 18 Oct 2019 16:18:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-10-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 784 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/9
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4556/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Sep 2019 13:39:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-13 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94
PS7, Line 94: Tuple* tuple_mem = Tuple::Create(sizeof(char) + 
GetSize(args...), _perm_pool_);
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@314
PS5, Line 314:
> nit: Can we come up with a better name? Maybe GetZDimensionValue() or somet
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@334
PS5, Line 334: turn Comp
> Maybe you could add a DCHECK(false); as well, and maybe a TODO comment. If
Done


http://gerrit.cloudera.org:8080/#/c/14080/5/be/src/util/tuple-row-compare.cc@383
PS5, Line 383: rn
> nit: since you use 'lhs' and 'rhs' at other places, maybe rename 'v1' and '
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209
PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* 
codegen,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323
PS7, Line 323:   constexpr uint64_t mask64 = 0x8000;
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395
PS7, Line 395: case TYPE_TIMESTAMP: {
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Sep 2019 12:58:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-13 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Currently, some tests are missing.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 776 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/8
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/4553/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 Sep 2019 14:25:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc
File be/src/util/tuple-row-compare-test.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare-test.cc@94
PS7, Line 94: uint8_t* tuple_row_mem = 
expr_perm_pool_.Allocate(sizeof(char*) + sizeof(int32_t*) * 2);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@209
PS7, Line 209: Status TupleRowLexicalComparator::CodegenCompare(LlvmCodeGen* 
codegen, llvm::Function** fn) {
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@323
PS7, Line 323:   constexpr uint128_t mask128 = ((uint128_t) -1) / 2 + 1; 
//0x8000;
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/14080/7/be/src/util/tuple-row-compare.cc@395
PS7, Line 395:   const uint128_t nanoseconds = 
static_cast(ts->time().total_nanoseconds());
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 Sep 2019 13:46:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-09-12 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14080


Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comperator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comperator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, but this can be relaxed.

Currently, strings, varchars, floats and doubles are not
supported.

Testing:
 * Added unit tests.
 * Currently, some tests are missing.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
16 files changed, 630 insertions(+), 58 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/7
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy