[Impala-ASF-CR] IMPALA-12630: Deflake TestOrcStats.test orc stats

2023-12-20 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20794 )

Change subject: IMPALA-12630: Deflake TestOrcStats.test_orc_stats
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb7bb25b44878df3e037e750a985656ed6857a7d
Gerrit-Change-Number: 20794
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 20 Dec 2023 17:01:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user

2023-09-15 Thread Norbert Luksa (Code Review)
Norbert Luksa has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20421 )

Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with 
a proxy user that does not delegate another user
..

IMPALA-12403: Fix Kerberos authentication when connecting with a proxy
user that does not delegate another user

This change improves LDAP checking in the Hiveserver2 API where
previously LDAP filter checks were run on the user principal received
in Kerberos authentication, but after the modification they are run on
the short username taken from the Kerberos principal.

Tested with new custom cluster tests.

Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1
Reviewed-on: http://gerrit.cloudera.org:8080/20421
Tested-by: Impala Public Jenkins 
Reviewed-by: Norbert Luksa 
---
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-hs2-server.cc
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/auth-util.cc
M be/src/util/auth-util.h
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M 
fe/src/test/java/org/apache/impala/customcluster/LdapKerberosImpalaShellTest.java
M 
fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java
12 files changed, 128 insertions(+), 22 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Norbert Luksa: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1
Gerrit-Change-Number: 20421
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Farkas 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Farkas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Norbert Luksa 


[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user

2023-09-15 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20421 )

Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with 
a proxy user that does not delegate another user
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1
Gerrit-Change-Number: 20421
Gerrit-PatchSet: 7
Gerrit-Owner: Gergely Farkas 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Farkas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Fri, 15 Sep 2023 07:17:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user

2023-09-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20421 )

Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with 
a proxy user that does not delegate another user
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1
Gerrit-Change-Number: 20421
Gerrit-PatchSet: 5
Gerrit-Owner: Gergely Farkas 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Farkas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Thu, 14 Sep 2023 14:44:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2023-03-10 Thread Norbert Luksa (Code Review)
Norbert Luksa has abandoned this change. ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10233: zorder sort node should output rows in lexical order of partition keys

2020-10-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16590 )

Change subject: IMPALA-10233: zorder sort node should output rows in lexical 
order of partition keys
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16590/2/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/16590/2/be/src/util/tuple-row-compare.cc@45
PS2, Line 45: num_lexical_keys_(tsort_info.num_lexical_keys),
This should be tsort_info.num_lexical_keys_in_zorder (the build failed because 
of this).



--
To view, visit http://gerrit.cloudera.org:8080/16590
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Gerrit-Change-Number: 16590
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Oct 2020 09:31:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10233: zorder sort node should output rows in lexical order of partition keys

2020-10-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16590 )

Change subject: IMPALA-10233: zorder sort node should output rows in lexical 
order of partition keys
..


Patch Set 2: Code-Review+2

LGTM! Thanks Quanlong for fixing this!


--
To view, visit http://gerrit.cloudera.org:8080/16590
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Gerrit-Change-Number: 16590
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 14 Oct 2020 09:28:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Unlock Z-ordering by default

2020-06-24 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16003 )

Change subject: IMPALA-8755: Unlock Z-ordering by default
..


Patch Set 4:

Sorry, didn't notice your comment, Zoltán. Will move the flag to the graveyard 
in a separate commit.


--
To view, visit http://gerrit.cloudera.org:8080/16003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811
Gerrit-Change-Number: 16003
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Jun 2020 07:44:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Unlock Z-ordering by default

2020-06-02 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16003


Change subject: IMPALA-8755: Unlock Z-ordering by default
..

IMPALA-8755: Unlock Z-ordering by default

Z-ordering has been around for a while behind a feature flag
(unlock_zorder_sort). It's around time to turn this flag on by default.

This commit beside setting the flag to true, merges the Z-order tests
from custom cluster tests into the normal test files.

Tests:
  - Run all related tests.

Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811
---
M be/src/common/global-flags.cc
M fe/src/main/cup/sql-parser.cup
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
D testdata/workloads/functional-query/queries/QueryTest/alter-table-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
D 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select-zorder.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
D 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-zorder.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
D 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table-zorder.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
D testdata/workloads/functional-query/queries/QueryTest/create-table-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
D tests/custom_cluster/test_zorder.py
20 files changed, 315 insertions(+), 465 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16003/2
--
To view, visit http://gerrit.cloudera.org:8080/16003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811
Gerrit-Change-Number: 16003
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-9625: Convert the name of a TAccessEvent to lowercase

2020-04-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15689 )

Change subject: IMPALA-9625: Convert the name of a TAccessEvent to lowercase
..


Patch Set 6:

Hi Fang-Yu, thanks for restarting the verification build, I see this time it 
passed.


--
To view, visit http://gerrit.cloudera.org:8080/15689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178
Gerrit-Change-Number: 15689
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Tue, 14 Apr 2020 07:44:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9625: Convert the name of a TAccessEvent to lowercase

2020-04-10 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15689 )

Change subject: IMPALA-9625: Convert the name of a TAccessEvent to lowercase
..


Patch Set 4: Code-Review+2

(1 comment)

Thanks Fang-Yu, since the exhaustive tests passed, I think it is good to go in 
now.

http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2758
PS1, Line 2758:   public void addAccessEvent(TAccessEvent event) {
> Hi Norbert, I just realized that simply adding "Preconditions.checkState(ev
Hi Fang-Yu, I agree, this looks like the safer option.



--
To view, visit http://gerrit.cloudera.org:8080/15689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178
Gerrit-Change-Number: 15689
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Fri, 10 Apr 2020 14:52:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9625: Convert fully-qualified table name to lowercase in getTable()

2020-04-09 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15689 )

Change subject: IMPALA-9625: Convert fully-qualified table name to lowercase in 
getTable()
..


Patch Set 1: Code-Review+1

(2 comments)

Hi Fang-Yu, left some comments, otherwise LGTM.

http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2758
PS1, Line 2758:   public void addAccessEvent(TAccessEvent event) {
I think it would be safer to check if the name is lower case here with a 
precondition.
Another option is to convert the name to lower case here.


http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2949
PS1, Line 2949: globalState_.accessEvents.add(new TAccessEvent(
Here we should also use the addAccessEvent method.



--
To view, visit http://gerrit.cloudera.org:8080/15689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178
Gerrit-Change-Number: 15689
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Comment-Date: Thu, 09 Apr 2020 08:53:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 3:

(6 comments)

Thanks for working on this. Left some comments, mostly questions to get a 
better understanding of how ACID works in Impala.
Later I will look at the tests as well.

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110
PS3, Line 110: file_idx -= num_part_cols;
nit: This and the same @115 could be moved outside the if.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118
PS3, Line 118:   table_idx += 1 + num_part_cols;
Could you explain the values here?
In this case the file_idx should be the original table_idx?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297
PS3, Line 297:
Removing this mean that we support write operations on full ACID tables? Do we 
really need to remove this one?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224
PS3, Line 224: ensureTableNotFullAcid
There are now two ensureTableNotFullAcid methods, that only differ in the 
parameters and the error msg thrown.
Is this method and the INSERT_ONLY_ACID_TABLE_SUPPORTED_ERROR_MSG message used 
anymore? Can they be removed?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@237
PS3, Line 237: String
nit: maybe change this string to enum?


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@238
PS3, Line 238:   throws AnalysisException {
nit: missing space



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 18 Mar 2020 17:34:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6360: Don't show full query statement on Impala WebUI by default

2020-03-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15288 )

Change subject: IMPALA-6360: Don't show full query statement on Impala WebUI by 
default
..


Patch Set 15: Code-Review+1

(1 comment)

Found a nit, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/15288/15/tests/webserver/test_web_pages.py
File tests/webserver/test_web_pages.py:

http://gerrit.cloudera.org:8080/#/c/15288/15/tests/webserver/test_web_pages.py@422
PS15, Line 422: x
nit: I would explicitly write it as "x " (notice the space after the x), like 
in the other test file.



--
To view, visit http://gerrit.cloudera.org:8080/15288
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7109a0be5d1022b4f8d6e72441cf5dc1dc42605
Gerrit-Change-Number: 15288
Gerrit-PatchSet: 15
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 18 Mar 2020 15:09:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner

2020-03-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
..


Patch Set 1:

(6 comments)

Thanks Quanlong and Csaba for reviewing. Addressed some comments, will do the 
testing with the next patch.
Also, I will play with different predicates since now I see huge regression for 
some queries. Eg. the following query takes around twice as much as before:
select * from lineitem where l_suppkey = 10

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@875
PS1, Line 875: UtcToUnixTime
> Looks like it's an ORC bug that the ORC writers (both Java and C++ versions
Changed to FloorUtcToUnixTimeMillis.
I see Csaba opened ORC-611 regarding the bug.


http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888
PS1, Line 888: case TYPE_VARCHAR:
 : case TYPE_CHAR: {
> I am not sure about the correctness in case of CHAR(N) and VARCHAR(N)
Added padding.
Maybe we should wait here with ORC-612?


http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@902
PS1, Line 902: type.GetByteSize()
> I think this should be "literal_expr->type().GetByteSize()" since it's cons
Done


http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@914
PS1, Line 914: static_cast((dv16->value() << 64) >> 64)
> Can we cast int128_t to uint64_t directly?
Looks like we can, done.


http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@959
PS1, Line 959: orc::Literal literal =
 : GetLiteralSearchArguments(eval, slot_desc->type(), 
_type);
 : 
 : if (fn_name == "lt") {
> This logic is not enough if there is some mismatch between Impala's and ORC
Same as above, wait for ORC-612?


http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@984
PS1, Line 984:   
row_reader_options_.setSearchArgument(std::move(final_sarg));
> Could you add a VLOG_FILE level logging for 'final_sarg'? It would be helpf
Done



--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 17 Mar 2020 12:55:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner

2020-03-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
..

IMPALA-6505: Min-Max predicate push down in ORC scanner

This commit implements min/max predicate pushdown for the ORC scanner
leveraging on the external ORC library's search arguments. We build
the search arguments when we open the scanner as we need not to
modify them later.

Also added a query option orc_read_statistics. If the option is set
to true (it is by default) predicate pushdown will take effect,
otherwise it will be skipped.

Tests:
 - Run scanner tests on ORC files.
 - Run TPCH benchmark, here is no improvement, nor regression.
   On the other hand, certain selective queries gained significant
   speed-up.

Further TODO: Bump ORC version since predicate pushdown is not yet
implemented in the upstream ORC lib (in review).

Change-Id: I136622413db21e0941d238ab6aeea901a6464845
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exprs/scalar-expr.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
10 files changed, 250 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/15403/2
--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner

2020-03-11 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
..


Patch Set 1:

The build should fail since we do not have the version of ORC with search 
arguments (in fact it's under review there as well).


--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Wed, 11 Mar 2020 12:41:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner

2020-03-11 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15403


Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
..

IMPALA-6505: Min-Max predicate push down in ORC scanner

This commit implements min/max predicate pushdown for the ORC scanner
leveraging on the external ORC library's search arguments. We build
the search arguments when we open the scanner as we need not to
modify them later.

Also added a query option orc_read_statistics. If the option is set
to true (it is by default) predicate pushdown will take effect,
otherwise it will be skipped.

Tests:
 - Run scanner tests on ORC files.
 - TODO: test performance
 - TODO: run test_scanners_fuzz
 - Bump ORC version since predicate pushdown is not yet
   implemented in the upstream ORC lib (in review).
Change-Id: I136622413db21e0941d238ab6aeea901a6464845
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exprs/scalar-expr.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
10 files changed, 228 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/15403/1
--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-6360: Don't show full query statement on Impala WebUI by default

2020-03-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15288 )

Change subject: IMPALA-6360: Don't show full query statement on Impala WebUI by 
default
..


Patch Set 12: Code-Review+1

(1 comment)

Hi, left a comment, otherwise LGTM.

http://gerrit.cloudera.org:8080/#/c/15288/12/tests/webserver/test_web_pages.py
File tests/webserver/test_web_pages.py:

http://gerrit.cloudera.org:8080/#/c/15288/12/tests/webserver/test_web_pages.py@423
PS12, Line 423: query
I think it would be easier to check this test if you would create it with an 
expression like:
"select \"{0}\"".format("x" * 439)
That way you still get a 448 long string, but it's easier to count.



--
To view, visit http://gerrit.cloudera.org:8080/15288
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7109a0be5d1022b4f8d6e72441cf5dc1dc42605
Gerrit-Change-Number: 15288
Gerrit-PatchSet: 12
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Thu, 05 Mar 2020 13:12:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080 [part 3]: Moved codegen code for TupleRowComparator into a Config object

2020-03-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15281 )

Change subject: IMPALA-4080 [part 3]: Moved codegen code for TupleRowComparator 
into a Config object
..


Patch Set 5: Code-Review+1

(1 comment)

Left a comment, but LGTM.

http://gerrit.cloudera.org:8080/#/c/15281/5/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

http://gerrit.cloudera.org:8080/#/c/15281/5/be/src/util/tuple-row-compare.h@74
PS5, Line 74:   TSortingOrder::type sorting_order_;
nit: you could add a comment here, something like "Indicates the ordering used 
to instantiate the comparator." And maybe: "Only used with SORT BY queries".



--
To view, visit http://gerrit.cloudera.org:8080/15281
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I265604ecbae2c80ff59e697fddfeea4976b25cd9
Gerrit-Change-Number: 15281
Gerrit-PatchSet: 5
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Mar 2020 12:28:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash

2020-03-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15347


Change subject: IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash
..

IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash

TupleRowCompareTest.DecimalTest crashes on release build
because of a faulty test. The problem occures with Decimal16Value
when the test tries to copy the values into the tuple in FillMem().

The solution is to use memcpy to avoid gcc generating unaligned
instructions like movaps for int128_t. They would raise SegmentFault
when addresses are not aligned to 16 bytes. This works with the
other types as well.

Tests:
 - Run TupleRowCompareTest.

Change-Id: If2a53c88b57cec3a0bfcf02c492fbb560a80682d
---
M be/src/util/tuple-row-compare-test.cc
1 file changed, 11 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/15347/1
--
To view, visit http://gerrit.cloudera.org:8080/15347
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If2a53c88b57cec3a0bfcf02c492fbb560a80682d
Gerrit-Change-Number: 15347
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 29:

Thanks Zoltan, included the header.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 28 Feb 2020 12:16:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-28 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,119 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/29
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 29
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-02-27 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@180
PS12, Line 180: >(
> I think that >= offsets.size() - 1 is needed, because of the offsets[index
Done


http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@184
PS12, Line 184: src_ptr = blob_ + offsets[index];
  : src_len = offsets[index + 1] - offsets[index];
> I think that we cannot trust completely in the length values at the moment
Thanks Csaba for investigating, uploaded a PS for the above check, will open 
ORC jiras for both of your comments.



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 27 Feb 2020 11:02:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-02-27 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
   and tpch query tests.
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 16.42s   14.38s  14%
   l_shipinstruct DICTIONARY 5.26s3.80s   32%
   l_commitdate   DICTIONARY 5.46s5.19s   5%
   all string col BOTH   39.06s   32.18s  21%

   The queries were run on a desktop PC with MT_DOP and NUM_NODES
   set to 1.
 * Also run TPC-H queries on the TPC-H benchmark where some
   queries' runtime improved by around 10-15%, while there were
   no regression for the others.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
4 files changed, 135 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/14
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Bump ORC version to 1.6.2-p7

2020-02-25 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15287


Change subject: IMPALA-9226: Bump ORC version to 1.6.2-p7
..

IMPALA-9226: Bump ORC version to 1.6.2-p7

Bump ORC version to include patch for ORC-600 that
unblocks IMPALA-9226.

Tests:
 - Run scanner tests for orc/def/block.

Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15287/1
--
To view, visit http://gerrit.cloudera.org:8080/15287
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65
Gerrit-Change-Number: 15287
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[native-toolchain-CR] IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain

2020-02-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15265


Change subject: IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain
..

IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain

This commit adds a bugfix patch that blocks IMPALA-9226.

Tests:
 - Run query_test for orc/def/block locally.
 - Builds succeeded in all supported platforms.

Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457
---
M buildall.sh
A 
source/orc/orc-1.6.2-patches/0007-ORC-600-Fix-StringDictionaryColumnReader-to-update-i.patch
2 files changed, 154 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/65/15265/1
--
To view, visit http://gerrit.cloudera.org:8080/15265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457
Gerrit-Change-Number: 15265
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-9351: Change path in TestCreateTableLikeFileOrc

2020-02-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15266


Change subject: IMPALA-9351: Change path in TestCreateTableLikeFileOrc
..

IMPALA-9351: Change path in TestCreateTableLikeFileOrc

There are some builds where AnalyzeDDLTest.TestCreateTableLikeFileOrc
failed due to non-existing path. The problem was with the complex
table check:
/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0'

As Joe suggested, this commit changes the path to another complex
table, "complextypestbl", which is loaded by Impala and not written
by Hive, so the location should always be the same.

Tests:
 - Run the changed test.

Change-Id: I94ac0f09a738e6c3edb068b036c1d50dc9861a7f
---
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/15266/1
--
To view, visit http://gerrit.cloudera.org:8080/15266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I94ac0f09a738e6c3edb068b036c1d50dc9861a7f
Gerrit-Change-Number: 15266
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

2020-02-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33
PS11, Line 33: UTCPTR
> I prefer it all capital for some reason, not sure why :)
I'm okay with all capital, too, just thought the style guide requires it in 
CamelCase.



--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 21 Feb 2020 09:51:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

2020-02-20 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
..


Patch Set 11: Code-Review+1

(3 comments)

Left some comments, otherwise it LGTM.
Will you open a Jira for the postponed changes/TODOs?

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG@9
PS11, Line 9: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33
PS11, Line 33: UTCPTR
Since it's not a macro anymore, maybe UtcPtr would be a better name?


http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h@a35
PS11, Line 35:
 :
Shouldn't we keep this comment?



--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 20 Feb 2020 16:28:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..


Patch Set 3:

Regarding Csaba's concerns, I checked if Hive supports the same kind of schema 
evolution for Parquet, since we know that Impala doesn't. Found out that Hive 
doesn't support it either, only for ORC.


--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 10:22:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-18 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26:

The verification failed due to a the flaky test_exchange_mem_usage_scaling and 
AuthorizationStmtTest.testSelect.
Run an exhaustive test, it passed: 
https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6472/


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Feb 2020 08:55:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add log of created files for data load

2020-02-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15234


Change subject: Add log of created files for data load
..

Add log of created files for data load

As Joe pointed out in IMPALA-9351, it would help debugging issues with
missing files if we had logged the created files when loading the data.

With this commit, running create-load-data.sh now logs the created
files into created-files.log.

Change-Id: I4f413810c6202a07c19ad1893088feedd9f7278f
---
M testdata/bin/create-load-data.sh
1 file changed, 3 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15234/1
--
To view, visit http://gerrit.cloudera.org:8080/15234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4f413810c6202a07c19ad1893088feedd9f7278f
Gerrit-Change-Number: 15234
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion

2020-02-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
..


Patch Set 2:

(5 comments)

Hi, left some minor comments.

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@7
PS2, Line 7: Orc
nit: ORC (here and below)


http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@36
PS2, Line 36: Further planned changes
Is it possible to gain some performance with this change? If yes, adding some 
measurements would be nice, I think.


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h@743
PS2, Line 743: is
nit: typo


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@150
PS2, Line 150: nullptr
nit: maybe it would be worth to mention here as a side note that UTCPTR is in 
fact nullptr


http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@152
PS2, Line 152:   local_time_zone_ = tz == 
::GetUtcTimezone() ? UTCPTR : tz;
Not related to this point, but as I see, the only other place (outside of some 
tests) you left GetUtcTimezone() in the code is in 
ImpalaServer::PrepareQueryContext. Wouldn't it make sense to refactor the code 
there as welll?



--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 17 Feb 2020 13:06:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 26:

The verification failed because the AllTypeTest added too many columns, with 
more slot size than possible. This resulted in a bitshift overflow when 
initialising a SlotRef. Added a comment and DCHECK, and removed some not too 
important columns from the test to prevent this issue from happening.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 26
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Feb 2020 14:25:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9256: Refactor constraint information into a separate class.

2020-02-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15213 )

Change subject: IMPALA-9256: Refactor constraint information into a separate 
class.
..


Patch Set 1:

(3 comments)

Hi, left some comments.

http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/Table.java@117
PS1, Line 117: sql
nit: SQL


http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@544
PS1, Line 544: primary
nit: or foreign keys/SQL constraints


http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@213
PS1, Line 213: return new SqlConstraints(new ArrayList<>(), new 
ArrayList<>());
Since IMPALA-9158 is merged, can't we return the SQL constraints here with 
primary and foreign keys?



--
To view, visit http://gerrit.cloudera.org:8080/15213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1c441c24df84d2d0791ffe94dff60d039a3341
Gerrit-Change-Number: 15213
Gerrit-PatchSet: 1
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 14 Feb 2020 13:03:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-14 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,118 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/25
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 25
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6663: Expose current DDL metrics on WebUI

2020-02-13 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13806 )

Change subject: IMPALA-6663: Expose current DDL metrics on WebUI
..


Patch Set 13: Code-Review+1

(2 comments)

Left a small nit, but lgtm.

http://gerrit.cloudera.org:8080/#/c/13806/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13806/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@427
PS8, Line 427:   case DROP_FUNCTION:
> It is possible that I misunderstood the concerns here, but I will try to an
I see, I misunderstood something, thanks for clarifying.


http://gerrit.cloudera.org:8080/#/c/13806/13/fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java
File 
fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java:

http://gerrit.cloudera.org:8080/#/c/13806/13/fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java@34
PS13, Line 34:  private final String TEST_UPDATE_SQL = "UPDATE table SET c3 = "
 :   + "'test';";
nit: this string could be written without concatenation.



--
To view, visit http://gerrit.cloudera.org:8080/13806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a
Gerrit-Change-Number: 13806
Gerrit-PatchSet: 13
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Feb 2020 11:41:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-04 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..

IMPALA-9290: ORC scanner should support schema evolution between date and 
timestamp types

This feature adds support for schema evolution between date and
timestamp for the ORC scanner. This means that we can have two
tables, one with a date column, another with a timestamp column,
and they can both point to the same set of ORC files. The result
will be that for the first table everything will be converted to
date, and for the second, everything to timestamp.

In order to do that, the OrcTimestampReader and OrcDateColumnReader
are modified to be able to handle batches of the two types. Their
name now represents the destination Impala type.

Note that the life cycle of a OrcColumnReader is within the life
cycle of the HdfsOrcScanner which only reads a split of an ORC
file, and an ORC file can't have two types for one column.

Tests:
 * Added type conversion tests.
 * Tested manually following the use case steps of the Jira.

Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
---
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
6 files changed, 132 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/3
--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-04 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@7
PS2, Line 7: between date and timestamp types
> Looking at IMPALA-6373 it seems like Impala usually supports these conversi
Parquet does not support this kind of compatibility between date and timestamp. 
I created date and timestamp tables and set them to point to the same location 
(as described in the Jira), and inserted some values to both tables. When I try 
to query any of them, I get an error that the schemas of the files are 
incompatible.

I think here we do not wish to follow Parquet and other file formats, but Hive.


http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@12
PS2, Line 12: to the same set of O
> nit: "to the same set of ORC files" probably expresses the intent better
Done


http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@121
PS2, Line 121:
> nit: Please add comment about what is an invalid value and how it is handle
Done


http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214
PS2, Line 214:  split of an ORC file, an
> Does Hive convert in both directions?
Yes, it does.


http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214
PS2, Line 214: HdfsOrcScanner which only re
> What does it mean "ORC support schema evolution"?
Done


http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@253
PS2, Line 253:
 : class OrcDateColumnReader : public OrcDateAndTimeStampReader {
 :  public:
 :   OrcDateColumnReader(const orc::Type* node, const 
SlotDescriptor* slot_desc,
 :   HdfsOrcScanner* scanner)
 :   : OrcDateAndTimeStampReader(node, slot_desc, scanne
> nit: complicated and duplicated in OrcTimestampReader. Can you put it into
Done



--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 04 Feb 2020 10:01:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG@20
PS1, Line 20:
> nit: wrap at 72 chars
Done


http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h@224
PS1, Line 224: tampReader(const orc::Type* node, const SlotDes
> Do we need this, can't we just compare check != nullprt?
I just followed the convention of the other UpdateInputBatch functions where 
the static casted batch is compared to the dynamic casted one. Maybe we could 
simplify the DCHECKs everywhere?


http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@211
PS1, Line 211:   if (IsNull(DCHECK_NOTNULL(batch_), row_idx)) {
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@226
PS1, Line 226: }
 : *slot = DateValue(ts.DaysSinceUnixEpoch());
 :   }
> This is probably not speed critical, but it could be done faster by ignorin
Done


http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@229
PS1, Line 229: alid())) {
> This will hit a DCHECK if the timestamp is not valid, see https://github.co
Done



--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Mon, 03 Feb 2020 15:47:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/15152 )

Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..

IMPALA-9290: ORC scanner should support schema evolution between date and 
timestamp types

This feature adds support for schema evolution between date and
timestamp for the ORC scanner. This means that we can have two
tables, one with a date column, another with a timestamp column,
and they can both point to the same ORC file. The result will be
that for the first table everything will be converted to date,
and for the second, everything to timestamp.

In order to do that, the OrcTimestampReader and OrcDateColumnReader
are modified to be able to handle batches of the two types. Their
name now represents the destination Impala type.

Note that the life cycle of a OrcColumnReader is within the life
cycle of the HdfsOrcScanner which only reads a split of an ORC
file, and an ORC file can't have two types for one column.

Tests:
 * Added type conversion tests.
 * Tested manually following the use case steps of the Jira.

Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
---
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
5 files changed, 125 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/2
--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,128 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 22
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
   and tpch query tests.
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 16.42s   14.38s  14%
   l_shipinstruct DICTIONARY 5.26s3.80s   32%
   l_commitdate   DICTIONARY 5.46s5.19s   5%
   all string col BOTH   39.06s   32.18s  21%

   The queries were run on a desktop PC with MT_DOP and NUM_NODES
   set to 1.
 * Also run TPC-H queries on the TPC-H benchmark where some
   queries' runtime improved by around 10-15%, while there were
   no regression for the others.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
4 files changed, 135 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/8
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h@212
PS7, Line 212: static_cast(batch_) ==
 : dynamic_cast(orc_batch)
> it will be true even if orc_batch is just a StringVectorBatch, because it w
Done



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 03 Feb 2020 10:07:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types

2020-02-03 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15152


Change subject: IMPALA-9290: ORC scanner should support schema evolution 
between date and timestamp types
..

IMPALA-9290: ORC scanner should support schema evolution between date and 
timestamp types

This feature adds support for schema evolution between date and
timestamp for the ORC scanner. This means that we can have two
tables, one with a date column, another with a timestamp column,
and they can both point to the same ORC file. The result will be
that for the first table everything will be converted to date,
and for the second, everything to timestamp.

In order to do that, the OrcTimestampReader and OrcDateColumnReader
are modified to be able to handle batches of the two types. Their
name now represents the destination Impala type.

Note that the life cycle of a OrcColumnReader is within the life cycle of the
HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't 
have
two types for one column.

Tests:
 * Added type conversion tests.
 * Tested manually following the use case steps of the Jira.

Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
---
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M tests/query_test/test_scanners.py
5 files changed, 115 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/1
--
To view, visit http://gerrit.cloudera.org:8080/15152
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e
Gerrit-Change-Number: 15152
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15051/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15051/6//COMMIT_MSG@35
PS6, Line 35: Also run TPC-H queries on the TPC-H benchmark where some
:queries' runtime improved by around 10-15%,
> You could also state explicitly whether other queries have regressed or not
Done


http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683
PS3, Line 683:
> It is still used for CollectionValueBuilder.
Oh, right.


http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@202
PS6, Line 202: batch_ = static_cast(orc_batch);
 : if (orc_batch == nullptr) return Status::OK();
> Is it ok that batch_ is not set to null if orc_batch is null?
Swapped the two lines.


http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@210
PS6, Line 210: _-
> Does Orc support mixed encoding, e.g. a stripe that has both dictionary and
Yes, according to the ORC documentation it cannot change through stripes.
I'm not sure if it's worth to add an extra variable just to make sure that this 
behaviour is preserved by the ORC lib. There are DCHECKs which should also fail 
in that case (dynamic casting to the Vector batch is based on the encoding).
I think a comment to clarify this behaviour would be enough, but please raise 
your concern if you think otherwise.


http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@211
PS6, Line 211:
> "RETURN_IF_ERROR" + "return Status::OK()" could be simplified to "return".
Done



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 30 Jan 2020 17:17:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
   and tpch query tests.
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 16.42s   14.38s  14%
   l_shipinstruct DICTIONARY 5.26s3.80s   32%
   l_commitdate   DICTIONARY 5.46s5.19s   5%
   all string col BOTH   39.06s   32.18s  21%

   The queries were run on a desktop PC with MT_DOP and NUM_NODES
   set to 1.
 * Also run TPC-H queries on the TPC-H benchmark where some
   queries' runtime improved by around 10-15%, while there were
   no regression for the others.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
4 files changed, 135 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/7
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,127 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/21
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 21
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683
PS3, Line 683:  dst_batch->tuple_data_pool()
> Thank you, Csaba for the detailed comment.
As I see (and as you also suggested offline), we do not need to pass the 
mempool here, right?



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 30 Jan 2020 12:31:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 6:

(5 comments)

Thank you all for taking a look, addressed your comments.

http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG@31
PS3, Line 31:   39.06s
> nit: probably 'a desktop PC', or 'a desktop PC with CPU X'
Done


http://gerrit.cloudera.org:8080/#/c/15051/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15051/2/be/src/exec/hdfs-orc-scanner.cc@613
PS2, Line 613:
> why did you restore the original narrow try-block?
I do not call the ORC library anymore in the UpdateInputBatch, therefore the 
change here is not necessary.


http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@625
PS3, Line 625:   orc_root_batch_ = 
row_reader_->createRowBatch(row_batch->capacity());
> nit: this extra new line is not needed
Done


http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683
PS3, Line 683:
> As we have discussed offline, this is no longer correct. Using the current
Thank you, Csaba for the detailed comment.
Added mempools for the dictionary and the non-dictionary cases, looks like it 
works now.


http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h@198
PS3, Line 198:   HdfsOrcScanne
> Looking at the ORC library, it should be possible, since it passes the dict
Added mempools for the scanner, now we can reuse the blobs.



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 30 Jan 2020 12:04:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-30 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
   and tpch query tests.
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 16.42s   14.38s  14%
   l_shipinstruct DICTIONARY 5.26s3.80s   32%
   l_commitdate   DICTIONARY 5.46s5.19s   5%
   all string col BOTH   39.06s   32.18s  21%

   The queries were run on a desktop PC with MT_DOP and NUM_NODES
   set to 1.
 * Also run TPC-H queries on the TPC-H benchmark where some
   queries' runtime improved by around 10-15%.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
4 files changed, 137 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9324: Correctly handle ORC UNION type in scanner

2020-01-27 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15103 )

Change subject: IMPALA-9324: Correctly handle ORC UNION type in scanner
..


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc@203
PS1, Line 203:   root_type->toString(), filename());
> Yes, added an error code.
I see, thanks for the clarification!


http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
File 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test:

http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@172
PS1, Line 172: 
> Yes, they can be ommitted if the expected string is not a regex. Substring
Great to know, thanks!
I'm okay if you leave it that way, as you find it best.



--
To view, visit http://gerrit.cloudera.org:8080/15103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec
Gerrit-Change-Number: 15103
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 27 Jan 2020 10:36:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-24 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG@24
PS3, Line 24: Some results:
> As discussed offline: These measurement were run with a debug build. Let' s
Right, unfortunately when I run the queries in release build, the results are 
not so bright. There is still a little speedup, but it's far away from the 
speedup in the debug build.
Will upload the exact numbers when I tested it with more runs.


http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h@198
PS3, Line 198: blob_ = nullptr
> So we cannot reuse blobs?
Looking at the ORC library, it should be possible, since it passes the 
dictionary along the way.
However, I did not find a simple solution for the same in Impala. After some 
digging, I found that the memory where the blob is currently allocated is owned 
by the row batch, and we cannot use this memory with the next row batch.
I'll try to check if I keep track of the row batch and reuse the blob between 
them, will come back to you with the results.



--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 24 Jan 2020 15:51:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9324: Correctly handle ORC UNION type in scanner

2020-01-24 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15103 )

Change subject: IMPALA-9324: Correctly handle ORC UNION type in scanner
..


Patch Set 1: Code-Review+1

(3 comments)

Left some minor comments, but it LGTM.

http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc@203
PS1, Line 203:   "Root of the selected type returned by the ORC lib is 
not STRUCT: $0. "
nit: this could be unified with orc-metadata-utils.cc:28, would it make sense 
to add an error code for this case as well?

Or are both checks necessary?


http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
File 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test:

http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@160
PS1, Line 160: Note that before
 : # IMPALA-9324 this query will crash.
nit: I think this comment here is not necessary.


http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@172
PS1, Line 172: has an incompatible ORC schema for column 
'$DATABASE.ill_complextypes.u',
Just a question: Isn't the fie is also part of the error msg? Can they be 
omitted?



--
To view, visit http://gerrit.cloudera.org:8080/15103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec
Gerrit-Change-Number: 15103
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Fri, 24 Jan 2020 10:50:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-24 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 4:

Rebased.


--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 24 Jan 2020 10:10:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-24 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 97s  30s 3.23
   l_shipinstruct DICTIONARY 86s  18s 4.77
   l_commitdate   DICTIONARY 85s  20s 4.25

   The queries were run on my machine with MT_DOP and NUM_NODES
   set to 1.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
3 files changed, 56 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/4
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,109 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/20
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319
PS19, Line 319:   // The algorithm requires all values having a common type, 
without loss of data.
  :   // This means we have to find the biggest type.
  :   int max_size = ordering_exprs_[0]->type().GetByteSize();
  :   for (int i = 1; i < ordering_exprs_.size(); ++i) {
> nit: the mask could be calculated in GetSharedRepresentation() instead of p
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423
PS19, Line 423: 
> Local variable U val shadows patameter void* val.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424
PS19, Line 424:
> nit: please add comment about it, something like "we copy the bytes from th
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434
PS19, Line 434: alue = *reinterpret_cast(val);
> It will only have the value of the first char of the string.
Done


http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435
PS19, Line 435: tmp, _value, sizeof(T));
> replace with 'sizeof(U) - std::min(sizeof(U), type.len)'?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 20
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Jan 2020 17:06:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15050 )

Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent 
with Consume()/Release()
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@175
PS5, Line 175: if (UNLIKELY(bytes == 0)) return true;
> The above DCHECK allows the value 0 and it should return success. We need t
Right, for some reasons I thought it's a DCHECK_LE, apologies!



--
To view, visit http://gerrit.cloudera.org:8080/15050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5
Gerrit-Change-Number: 15050
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Jan 2020 14:20:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()

2020-01-22 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15050 )

Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent 
with Consume()/Release()
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@175
PS5, Line 175: if (UNLIKELY(bytes == 0)) return true;
Shouldn't it be handled like at Release? Since there is a DCHECK for this 
above, we should not even reach this line. You also say in the commit message 
that the consumption is not updated when the bytes are zero.


http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@218
PS5, Line 218: < 0
nit: did you mean <= ?



--
To view, visit http://gerrit.cloudera.org:8080/15050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5
Gerrit-Change-Number: 15050
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Jan 2020 13:57:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be acquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name | encoding | before | after | speedup
   =
   l_comment  DIRECT 97s  30s 3.23
   l_shipinstruct DICTIONARY 86s  18s 4.77
   l_commitdate   DICTIONARY 85s  20s 4.25

   The queries were run on my machine with MT_DOP and NUM_NODES
   set to 1.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
3 files changed, 57 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/3
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[native-toolchain-CR] IMPALA-6772: Add ORC 1.6.2 to toolchain

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15073 )

Change subject: IMPALA-6772: Add ORC 1.6.2 to toolchain
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh
File buildall.sh:

http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh@347
PS3, Line 347:   export SNAPPY_VERSION=1.1.4
> IIUC, we use these versions in our native-toolchain to have consistent beha
I see, thanks for clarifying!



--
To view, visit http://gerrit.cloudera.org:8080/15073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62734f791fd8fc098579f488a869a2153ebad919
Gerrit-Change-Number: 15073
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 14:53:06 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-6772: Add ORC 1.6.2 to toolchain

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15073 )

Change subject: IMPALA-6772: Add ORC 1.6.2 to toolchain
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh
File buildall.sh:

http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh@347
PS3, Line 347:   export SNAPPY_VERSION=1.1.4
Shouldn't these versions get updated to match the ones in the ORC lib?
https://github.com/apache/orc/blob/master/cmake_modules/ThirdpartyToolchain.cmake#L14



--
To view, visit http://gerrit.cloudera.org:8080/15073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62734f791fd8fc098579f488a869a2153ebad919
Gerrit-Change-Number: 15073
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 14:29:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   The only drawback is the sorting itself, taking ~4 times more
   than lexical sorting (eg. sorting for the dataset above took
   14m for Lexical, and 55m for Z-ordering).
   Note however, that this is a one-time thing to do, sorting
   only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/19
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 19
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34
PS14, Line 34: getting great results
> Could you provide some basic statistics?
Done


http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35
PS14, Line 35: One negative is the sorting itself, taking
 :4-7 more times than lexical sorting.
> You could emphasize that it only affects the writes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 10:52:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. An example: used the TPCH dataset's
   lineitem table with scale 25, where the sorting columns are
   l_partkey and l_suppkey, in that order. Run selective queries
   for the value range of the two columns, for both lexical and
   Z-ordering and compared the percentage of filtered pages and
   row groups. While queries with filters on the first column
   showed almost no difference, queries on the second column
   is in favour of Z-ordering:
   Ordering | Column | Filtered pages % | Filtered row groups %
   Lex.   1st  ~99%   ~90%
   Z-ord. 1st  ~99%   ~89%
   Lex.   2nd  ~25%   0%
   Z-ord. 2nd  ~97%   0%
   A only drawback is the sorting itself, taking ~4 times more
   than lexical sorting. Note however, that this is a one-time
   thing to do, sorting only happens once, when writing the data.
   Also, lexical ordering is supported by codegen, while it is
   not implemented for Z-ordering yet.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/18
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 18
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,062 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/16
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2020-01-21 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..


Patch Set 16:

Rebased.


--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 16
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 21 Jan 2020 09:24:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..


Patch Set 2:

As Zoltán suggested, I tried to add an optimisation, where I would not have to 
copy the blob in each batch, but only for stripes. I have not managed to figure 
it out, looks like the memory is freed between the batches. I left this part of 
the code in the commit in a comment for now, will try to solve it next week.


--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 17 Jan 2020 17:21:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15051


Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

WIP: IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be aquired
with the given indices and lengths.

Tests:
 * TODO: performance tests.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
3 files changed, 118 insertions(+), 49 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/1
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner

2020-01-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
..

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be aquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name  | encoding | before  | after
   =
   l_comment   DIRECT 14.05s (97s)  4.11s (30s)
   l_shipinstruct  DICTIONARY 12.36s (86s)  2.16s (18s)
   l_commitdateDICTIONARY 12.56s (85s)  2.54s (20s)

   The queries are run on my machine with 8 core, while
   in the brackets the durations are when MT_DOP and NUM_NODES
   are set to 1.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
3 files changed, 95 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/2
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

2020-01-17 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
..


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py@1302
PS8, Line 1302: an
nit: the comments could be updated a bit, since not only one illtypes table is 
created



--
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 17 Jan 2020 12:26:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()

2020-01-16 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15050 )

Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent 
with Consume()/Release()
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5
Gerrit-Change-Number: 15050
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 16 Jan 2020 15:17:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2020-01-10 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..


Patch Set 11:

Rebased.


--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 11
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 10 Jan 2020 12:48:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2020-01-10 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..

IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files

The change relies on the ORC Java API with version 1.5 or
greater, because of a bug in earlier versions. Therefore, ORC is
listed as an external dependency, instead of relying on Hive's
ORC version (from Hive3, Hive also lists it as a dependency).

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemaExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M impala-parent/pom.xml
M shaded-deps/pom.xml
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
15 files changed, 495 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 11
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9277: Catch exception thrown from orc::ColumnSelector::updateSelectedByTypeId

2020-01-09 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14994 )

Change subject: IMPALA-9277: Catch exception thrown from 
orc::ColumnSelector::updateSelectedByTypeId
..


Patch Set 1: Code-Review+1

Lgtm, thanks for fixing!


--
To view, visit http://gerrit.cloudera.org:8080/14994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f706bc832298cb5089e539b7a818cb86d02199f
Gerrit-Change-Number: 14994
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 09 Jan 2020 18:13:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP]IMPALA-8778: Support Apache Hudi Read Optimized Table

2020-01-09 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14711 )

Change subject: [WIP]IMPALA-8778: Support Apache Hudi Read Optimized Table
..


Patch Set 7:

(3 comments)

Hi! Thanks for working on this, left some comments there.

http://gerrit.cloudera.org:8080/#/c/14711/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14711/7//COMMIT_MSG@13
PS7, Line 13:
Could you please list the tests you did here?


http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@88
PS7, Line 88: fileFormat
Could you update the comments above?


http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@165
PS7, Line 165:   // HUDI_PARQUET doesn't support converting from thrift
Maybe it would be better to add an empty HUDI_PARQUET case over the default 
case, just to have it listed here.



--
To view, visit http://gerrit.cloudera.org:8080/14711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Gerrit-Change-Number: 14711
Gerrit-PatchSet: 7
Gerrit-Owner: Yanjia Gary Li 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Yanjia Gary Li 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 09 Jan 2020 17:18:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

2020-01-08 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jan 2020 13:02:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9010: Add builtin mask functions

2020-01-07 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14963 )

Change subject: IMPALA-9010: Add builtin mask functions
..


Patch Set 2: Code-Review+1

(4 comments)

That seems a lot of work, nice job! Added some small comments, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc@10449
PS2, Line 10449:   // Error handling
What happens when one would mask the day in 2019-02-02 to 30? Could you add a 
test for it here?


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc
File be/src/exprs/mask-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@223
PS2, Line 223: !(1 <= day_value && day_value <= 31)
This considers 31 as a valid day number for eg. February. Shouldn't this be 
checked here? What is the behaviour of Hive?


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@266
PS2, Line 266: 4
Wouldn't it be nicer if this constant were defined at the beginning of the file 
with the other constants?
It appears here and many times below.


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@697
PS2, Line 697:   (void)SHA256(val.ptr, val.len, sha256_hash.ptr);
nit: Wouldn't using "discard_result" be nicer here?



--
To view, visit http://gerrit.cloudera.org:8080/14963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
Gerrit-Change-Number: 14963
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jan 2020 15:52:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner

2020-01-07 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14982 )

Change subject: IMPALA-8801: Date type support for ORC scanner
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@224
PS2, Line 224: public:
nit: space is missing?


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@235
PS2, Line 235: private:
nit: space is missing? (see other classes below)


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@101
PS2, Line 101:   case TYPE_DATE: {
nit: the braces can be omitted


http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@213
PS2, Line 213: !dv.IsValid()
I think you could add UNLIKELY here.


http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@30
PS2, Line 30: Supported HDFS file formats. Every file format specifies:
Not related to this commit specifically, but would be good to have this comment 
updated with the sixth parameter's description.



--
To view, visit http://gerrit.cloudera.org:8080/14982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Gerrit-Change-Number: 14982
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Tue, 07 Jan 2020 13:12:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-12-20 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14811/8/fe/pom.xml
File fe/pom.xml:

http://gerrit.cloudera.org:8080/#/c/14811/8/fe/pom.xml@309
PS8, Line 309:   org.apache.orc
> I think we need to add ORC stuffs in shaded-deps/pom.xml so we don't use or
Done


http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java
File fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java:

http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java@181
PS8, Line 181:* parsing the ORC schema, or if the ORC types cannot be 
represented in Impala.
> Change Parquet to ORC
Done


http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
File fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java:

http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@28
PS8, Line 28: ORCSchemeExtractor and ParquetScheme
> nit: OrcSchemaExtractor and ParquetSchemaExtractor
Done


http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@35
PS8, Line 35:   public static void CheckIfFile(Path pathToFile) throws 
AnalysisException {
> What about merging this into FileSystemUtil.java? Other utilities for files
Thought about that, but this file contains Analysis specific utilities and as I 
saw FileSystemUtil.java has nothing of the sort, so I did not want to mingle 
them.



--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 10
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Dec 2019 09:00:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-12-20 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..

IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files

The change relies on the ORC Java API with version 1.5 or
greater, because of a bug in earlier versions. Therefore, ORC is
listed as an external dependency, instead of relying on Hive's
ORC version (from Hive3, Hive also lists it as a dependency).

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemaExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M impala-parent/pom.xml
M shaded-deps/pom.xml
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
15 files changed, 495 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 10
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] Fix bug in report benchmark results.py

2019-12-19 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14925 )

Change subject: Fix bug in report_benchmark_results.py
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/14925
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a163e74fefc896464e35a6a1b91ce57de592f1a
Gerrit-Change-Number: 14925
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Thu, 19 Dec 2019 08:35:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,060 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/13
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 13
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering

2019-12-12 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/14080 )

Change subject: IMPALA-8755: Backend support for Z-ordering
..

IMPALA-8755: Backend support for Z-ordering

This change depends on gerrit.cloudera.org/#/c/13955/
(Frontend support for Z-ordering)

The commit adds a Comparator based on Z-ordering. See in detail:
https://en.wikipedia.org/wiki/Z-order_curve

The comparator instead of calculating the Z-values of the rows,
looks for the column with the most significant dimension, and
compares the values of this column only. The most significant
dimension will be the one where the compared values have the
highest different bits. The algorithm requires values of
the same binary representation, therefore the values are
converted into either uint32_t, uint63_t or uint128_t, the
smallest in which all data fits. Comparing smaller types with
bigger ones would make the bigger type much more dominant
therefore the bits of these smaller types are shifted up.

All primitive types (including string and floating point types)
are supported.

Testing:
 * Added unit tests.
 * Run manual tests, comparing 4-column values with 4-bit
   integers, for all possible combinations. Checked the result by
   calculating the Z-value for each comparison.
 * Tested performance on various data, getting great results for
   selective queries. One negative is the sorting itself, taking
   4-7 more times than lexical sorting.

Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
---
M be/src/exec/exchange-node.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/util/CMakeLists.txt
A be/src/util/tuple-row-compare-test.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
18 files changed, 1,059 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/14
--
To view, visit http://gerrit.cloudera.org:8080/14080
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab
Gerrit-Change-Number: 14080
Gerrit-PatchSet: 14
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9207: [DOCS] Documented the #Inst in exec summary

2019-12-08 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14860 )

Change subject: IMPALA-9207: [DOCS] Documented the #Inst in exec summary
..


Patch Set 2: Code-Review+1

(2 comments)

Found some more nits, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_explain_plan.xml
File docs/topics/impala_explain_plan.xml:

http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_explain_plan.xml@175
PS2, Line 175:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_mem_limit.xml
File docs/topics/impala_mem_limit.xml:

http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_mem_limit.xml@209
PS2, Line 209: #Hosts
nit: #Inst



--
To view, visit http://gerrit.cloudera.org:8080/14860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I938930c66144ba6bce766981d363abe4b28ba524
Gerrit-Change-Number: 14860
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Sun, 08 Dec 2019 13:57:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9207: [DOCS] Documented the #Inst in exec summary

2019-12-06 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14860 )

Change subject: IMPALA-9207: [DOCS] Documented the #Inst in exec summary
..


Patch Set 1:

(2 comments)

Found a few more occurrences of a summary query when I grepped on "#Hosts":
* docs/topics/impala_mem_limit.xml
* docs/topics/impala_live_summary.xml
should also be updated.

http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml
File docs/topics/impala_explain_plan.xml:

http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml@147
PS1, Line 147: When the MT_DOP query option is set to a value 
larger than
 : 0, #Inst column in the 
output shows the number of
 : fragment instances. Impala decomposes each query into 
smaller units of work that are
 : distributed across the cluster, and these units are 
referred as fragments.
The #Inst column in the output will be also shown when MT_DOP=0, with the same 
value as #Hosts, since there will be exactly one fragment for each host. (Might 
be good to clarify, here or at the example.)


http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml@168
PS1, Line 168: 0
This and the zeros below should be ones.



--
To view, visit http://gerrit.cloudera.org:8080/14860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I938930c66144ba6bce766981d363abe4b28ba524
Gerrit-Change-Number: 14860
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Fri, 06 Dec 2019 12:23:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..


Patch Set 8:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@74
PS7, Line 74: Schema
> nit: Isn't this meant to be Schema
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@77
PS7, Line 77: Schema
> nit: same as above
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
File fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java:

http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@49
PS7, Line 49:
> nit: Schema
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@49
PS7, Line 49:
> nit: use camel case for acronyms, e.g. OrcSchemaExtractor
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@50
PS7, Line 50:
> I haven't seen any tests for this error msg. Did I miss something?
Tried to create an ORC file with a non-primitive map key, but did not find an 
easy way to do so with the available tools.


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@74
PS7, Line 74:
> nit: Orc
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@75
PS7, Line 75:
> Can it really throw AnalysisException?
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@139
PS7, Line 139:
> Might worth a DCHECK here as well that the size of fieldNames equals to the
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@148
PS7, Line 148:
> nit: Orc
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@170
PS7, Line 170:
> nit: Orc
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
File fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java:

http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java@50
PS7, Line 50:
> nit: Schema
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029
PS7, Line 2029:
> To be in line with TestCreateTableLikeFile() could you rename this to TestC
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029
PS7, Line 2029:
> I'd also add the "file does't exist" coverage to this function.
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029
PS7, Line 2029: @Test
> nit: put @Test into a separate line
Done


http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2030
PS7, Line 2030: leLikeFileOrc() throws AnalysisException {
> I though Apache Impala has ORC support regardless of the Hive version. I ca
Right, this comment is misleading. What I meant to write was that the Java API 
that comes with the CDH Hive ORC fails getting the schema of the ORC file, 
because of a bug in ORC.
This is solved however by listing ORC as an external dependency (like Hive does 
from version 3), instead of relying on Hive's ORC version.
(The bug is fixed in later versions.)


http://gerrit.cloudera.org:8080/#/c/14811/7/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test:

http://gerrit.cloudera.org:8080/#/c/14811/7/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test@254
PS7, Line 254: --- QUERY
> Why was this parquet test necessary for this ORC related patch?
This test is not related to ORC support, was only at the wrong place (in 
create-table-like-file.test)



--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment

[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..

IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files

The change relies on the ORC Java API with version 1.5 or
greater, because of a bug in earlier versions. Therefore, ORC is
listed as an external dependency, instead of relying on Hive's
ORC version (from Hive3, Hive also lists it as a dependency).

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemaExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M impala-parent/pom.xml
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
14 files changed, 460 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/8
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9217: Adjust limits for TZH and TZM datetime tokens

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14848 )

Change subject: IMPALA-9217: Adjust limits for TZH and TZM datetime tokens
..


Patch Set 1: Code-Review+1

Had a look at other tests in the file, as I see they did not use invalid 
values. Lgtm.


--
To view, visit http://gerrit.cloudera.org:8080/14848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6fe2534d51396bb5652af6301866e2dd0f3282c2
Gerrit-Change-Number: 14848
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Thu, 05 Dec 2019 12:33:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..


Patch Set 2: Code-Review+1

(4 comments)

Found some nits, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443
PS2, Line 443: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456
PS2, Line 456: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
File 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6
PS2, Line 6: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16
PS2, Line 16: Orc
nit: same here



--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 10:54:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9215: report benchmark results.py fails with missing key

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14830 )

Change subject: IMPALA-9215: report_benchmark_results.py fails with missing key
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/14830
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I822c86f621f5a348b56d672c263a2cf9321767ee
Gerrit-Change-Number: 14830
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Comment-Date: Thu, 05 Dec 2019 08:13:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
..

WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files. The change relies on
ORC 1.5, therefore can only be used with USE_CDP_HIVE=true.

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemeExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.
   TODO: these tests might fail

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
11 files changed, 435 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/7
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
..

WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files. The change relies on
ORC 1.5, therefore can only be used with USE_CDP_HIVE=true.

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemeExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.
   TODO: these tests might fail

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
11 files changed, 435 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/6
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
..

IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files. The change relies on
ORC 1.5, therefore can only be used with USE_CDP_HIVE=true.

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemeExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
11 files changed, 418 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/4
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
..

WIP: IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files. The change relies on
ORC 1.5, therefore can only be used with USE_CDP_HIVE=true.

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemeExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
11 files changed, 418 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/5
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14811/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14811/3//COMMIT_MSG@19
PS3, Line 19: CREATE TABLE tablename LIKE ORC '/path/to/file'
> CREATE TABLE LIKE without file format name always means Parquet? It would b
CREATE TABLE LIKE  'path/to/file' is not a valid query and the file format has 
to be added explicitly, there is no default value.


http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
File fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java:

http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@80
PS3, Line 80:   case BYTE: return Type.TINYINT;
> I saw different type names in https://orc.apache.org/docs/types.html, e.g.
In the documentation the ORC types are listed under the enum 
TypeDescription.Category, where these type names were present. It looks like 
that in the Java API the Java type equivalents were used.

https://orc.apache.org/api/orc-core/org/apache/orc/TypeDescription.Category.html


http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
File fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java:

http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@1
PS3, Line 1: package org.apache.impala.util;
> Missing apache license (this also led to the build failure)
Done


http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test:

http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@58
PS3, Line 58:
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@74
PS3, Line 74: 
:  QUERY
: drop table allcomplextypes_clone_orc
:  RESULTS
: 'Table has been dropped.'
> Why is this needed? Other tables are not dropped + it is done automatically
Done


http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test:

http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test@255
PS3, Line 255: drop table if exists allcomplextypes_clone
> We shouldn't need this - test_create_table_like_table() uses a unique_datab
Done



--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 29 Nov 2019 15:12:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14811 )

Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..


Patch Set 3:

The e2e tests did not run yet, because of the ORC dependency problem mentioned 
in the commit message (sorry for not mentioning before).


--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 29 Nov 2019 14:53:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file

2019-11-29 Thread Norbert Luksa (Code Review)
Norbert Luksa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14811


Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file
..

IMPALA-8046: Support CREATE TABLE from an ORC file

Impala supports creating a table using the schema of a file.
However, only Parquet is supported currently. This commit adds
support for creating tables from ORC files. The change relies on
ORC 1.5, therefore can only be used with USE_CDP_HIVE=true.

Also, the commit performs a little clean-up on the ParquetHelper
class, renaming it to ParquetSchemeExtractor and removing outdated
comments.

To create a table from an ORC file, run:
CREATE TABLE tablename LIKE ORC '/path/to/file'

Tests:
 * Added analysis tests for primitive and complex types.
 * Added e2e tests for creating tables from ORC files.

Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java
R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M tests/common/skip.py
M tests/metadata/test_ddl.py
11 files changed, 417 insertions(+), 79 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/3
--
To view, visit http://gerrit.cloudera.org:8080/14811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d
Gerrit-Change-Number: 14811
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


  1   2   >