[Impala-ASF-CR] IMPALA-12630: Deflake TestOrcStats.test orc stats
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20794 ) Change subject: IMPALA-12630: Deflake TestOrcStats.test_orc_stats .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb7bb25b44878df3e037e750a985656ed6857a7d Gerrit-Change-Number: 20794 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 20 Dec 2023 17:01:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user
Norbert Luksa has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20421 ) Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user .. IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user This change improves LDAP checking in the Hiveserver2 API where previously LDAP filter checks were run on the user principal received in Kerberos authentication, but after the modification they are run on the short username taken from the Kerberos principal. Tested with new custom cluster tests. Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1 Reviewed-on: http://gerrit.cloudera.org:8080/20421 Tested-by: Impala Public Jenkins Reviewed-by: Norbert Luksa --- M be/src/rpc/authentication.cc M be/src/rpc/thrift-server.h M be/src/service/impala-hs2-server.cc M be/src/transport/TSasl.cpp M be/src/transport/TSasl.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/auth-util.cc M be/src/util/auth-util.h M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java M fe/src/test/java/org/apache/impala/customcluster/LdapKerberosImpalaShellTest.java M fe/src/test/java/org/apache/impala/customcluster/LdapSearchBindImpalaShellTest.java 12 files changed, 128 insertions(+), 22 deletions(-) Approvals: Impala Public Jenkins: Verified Norbert Luksa: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1 Gerrit-Change-Number: 20421 Gerrit-PatchSet: 8 Gerrit-Owner: Gergely Farkas Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gergely Farkas Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Norbert Luksa
[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20421 ) Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1 Gerrit-Change-Number: 20421 Gerrit-PatchSet: 7 Gerrit-Owner: Gergely Farkas Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gergely Farkas Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Fri, 15 Sep 2023 07:17:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20421 ) Change subject: IMPALA-12403: Fix Kerberos authentication when connecting with a proxy user that does not delegate another user .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0141cd341cda84ed40cd72f8a038c8d5a215e5e1 Gerrit-Change-Number: 20421 Gerrit-PatchSet: 5 Gerrit-Owner: Gergely Farkas Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gergely Farkas Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Thu, 14 Sep 2023 14:44:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has abandoned this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10233: zorder sort node should output rows in lexical order of partition keys
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/16590 ) Change subject: IMPALA-10233: zorder sort node should output rows in lexical order of partition keys .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16590/2/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/16590/2/be/src/util/tuple-row-compare.cc@45 PS2, Line 45: num_lexical_keys_(tsort_info.num_lexical_keys), This should be tsort_info.num_lexical_keys_in_zorder (the build failed because of this). -- To view, visit http://gerrit.cloudera.org:8080/16590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54 Gerrit-Change-Number: 16590 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Oct 2020 09:31:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10233: zorder sort node should output rows in lexical order of partition keys
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/16590 ) Change subject: IMPALA-10233: zorder sort node should output rows in lexical order of partition keys .. Patch Set 2: Code-Review+2 LGTM! Thanks Quanlong for fixing this! -- To view, visit http://gerrit.cloudera.org:8080/16590 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54 Gerrit-Change-Number: 16590 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 14 Oct 2020 09:28:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Unlock Z-ordering by default
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/16003 ) Change subject: IMPALA-8755: Unlock Z-ordering by default .. Patch Set 4: Sorry, didn't notice your comment, Zoltán. Will move the flag to the graveyard in a separate commit. -- To view, visit http://gerrit.cloudera.org:8080/16003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811 Gerrit-Change-Number: 16003 Gerrit-PatchSet: 4 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Jun 2020 07:44:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Unlock Z-ordering by default
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16003 Change subject: IMPALA-8755: Unlock Z-ordering by default .. IMPALA-8755: Unlock Z-ordering by default Z-ordering has been around for a while behind a feature flag (unlock_zorder_sort). It's around time to turn this flag on by default. This commit beside setting the flag to true, merges the Z-order tests from custom cluster tests into the normal test files. Tests: - Run all related tests. Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811 --- M be/src/common/global-flags.cc M fe/src/main/cup/sql-parser.cup M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java D testdata/workloads/functional-query/queries/QueryTest/alter-table-zorder.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test D testdata/workloads/functional-query/queries/QueryTest/create-table-as-select-zorder.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test D testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-zorder.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test D testdata/workloads/functional-query/queries/QueryTest/create-table-like-table-zorder.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test D testdata/workloads/functional-query/queries/QueryTest/create-table-zorder.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test D tests/custom_cluster/test_zorder.py 20 files changed, 315 insertions(+), 465 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16003/2 -- To view, visit http://gerrit.cloudera.org:8080/16003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I653e0e2db8f7bc2dd077943b3acf667514d45811 Gerrit-Change-Number: 16003 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-9625: Convert the name of a TAccessEvent to lowercase
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15689 ) Change subject: IMPALA-9625: Convert the name of a TAccessEvent to lowercase .. Patch Set 6: Hi Fang-Yu, thanks for restarting the verification build, I see this time it passed. -- To view, visit http://gerrit.cloudera.org:8080/15689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178 Gerrit-Change-Number: 15689 Gerrit-PatchSet: 6 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Tue, 14 Apr 2020 07:44:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9625: Convert the name of a TAccessEvent to lowercase
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15689 ) Change subject: IMPALA-9625: Convert the name of a TAccessEvent to lowercase .. Patch Set 4: Code-Review+2 (1 comment) Thanks Fang-Yu, since the exhaustive tests passed, I think it is good to go in now. http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2758 PS1, Line 2758: public void addAccessEvent(TAccessEvent event) { > Hi Norbert, I just realized that simply adding "Preconditions.checkState(ev Hi Fang-Yu, I agree, this looks like the safer option. -- To view, visit http://gerrit.cloudera.org:8080/15689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178 Gerrit-Change-Number: 15689 Gerrit-PatchSet: 4 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Fri, 10 Apr 2020 14:52:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9625: Convert fully-qualified table name to lowercase in getTable()
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15689 ) Change subject: IMPALA-9625: Convert fully-qualified table name to lowercase in getTable() .. Patch Set 1: Code-Review+1 (2 comments) Hi Fang-Yu, left some comments, otherwise LGTM. http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2758 PS1, Line 2758: public void addAccessEvent(TAccessEvent event) { I think it would be safer to check if the name is lower case here with a precondition. Another option is to convert the name to lower case here. http://gerrit.cloudera.org:8080/#/c/15689/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2949 PS1, Line 2949: globalState_.accessEvents.add(new TAccessEvent( Here we should also use the addAccessEvent method. -- To view, visit http://gerrit.cloudera.org:8080/15689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0d9ba58da891921fafbfe7c6db358b51965e178 Gerrit-Change-Number: 15689 Gerrit-PatchSet: 1 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Thu, 09 Apr 2020 08:53:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 ) Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema .. Patch Set 3: (6 comments) Thanks for working on this. Left some comments, mostly questions to get a better understanding of how ACID works in Impala. Later I will look at the tests as well. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110 PS3, Line 110: file_idx -= num_part_cols; nit: This and the same @115 could be moved outside the if. http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118 PS3, Line 118: table_idx += 1 + num_part_cols; Could you explain the values here? In this case the file_idx should be the original table_idx? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297 PS3, Line 297: Removing this mean that we support write operations on full ACID tables? Do we really need to remove this one? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224 PS3, Line 224: ensureTableNotFullAcid There are now two ensureTableNotFullAcid methods, that only differ in the parameters and the error msg thrown. Is this method and the INSERT_ONLY_ACID_TABLE_SUPPORTED_ERROR_MSG message used anymore? Can they be removed? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@237 PS3, Line 237: String nit: maybe change this string to enum? http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@238 PS3, Line 238: throws AnalysisException { nit: missing space -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 Mar 2020 17:34:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6360: Don't show full query statement on Impala WebUI by default
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15288 ) Change subject: IMPALA-6360: Don't show full query statement on Impala WebUI by default .. Patch Set 15: Code-Review+1 (1 comment) Found a nit, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/15288/15/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: http://gerrit.cloudera.org:8080/#/c/15288/15/tests/webserver/test_web_pages.py@422 PS15, Line 422: x nit: I would explicitly write it as "x " (notice the space after the x), like in the other test file. -- To view, visit http://gerrit.cloudera.org:8080/15288 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7109a0be5d1022b4f8d6e72441cf5dc1dc42605 Gerrit-Change-Number: 15288 Gerrit-PatchSet: 15 Gerrit-Owner: Adam Tamas Gerrit-Reviewer: Adam Tamas Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Wed, 18 Mar 2020 15:09:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 ) Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner .. Patch Set 1: (6 comments) Thanks Quanlong and Csaba for reviewing. Addressed some comments, will do the testing with the next patch. Also, I will play with different predicates since now I see huge regression for some queries. Eg. the following query takes around twice as much as before: select * from lineitem where l_suppkey = 10 http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@875 PS1, Line 875: UtcToUnixTime > Looks like it's an ORC bug that the ORC writers (both Java and C++ versions Changed to FloorUtcToUnixTimeMillis. I see Csaba opened ORC-611 regarding the bug. http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888 PS1, Line 888: case TYPE_VARCHAR: : case TYPE_CHAR: { > I am not sure about the correctness in case of CHAR(N) and VARCHAR(N) Added padding. Maybe we should wait here with ORC-612? http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@902 PS1, Line 902: type.GetByteSize() > I think this should be "literal_expr->type().GetByteSize()" since it's cons Done http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@914 PS1, Line 914: static_cast((dv16->value() << 64) >> 64) > Can we cast int128_t to uint64_t directly? Looks like we can, done. http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@959 PS1, Line 959: orc::Literal literal = : GetLiteralSearchArguments(eval, slot_desc->type(), _type); : : if (fn_name == "lt") { > This logic is not enough if there is some mismatch between Impala's and ORC Same as above, wait for ORC-612? http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@984 PS1, Line 984: row_reader_options_.setSearchArgument(std::move(final_sarg)); > Could you add a VLOG_FILE level logging for 'final_sarg'? It would be helpf Done -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 17 Mar 2020 12:55:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner
Norbert Luksa has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/15403 ) Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner .. IMPALA-6505: Min-Max predicate push down in ORC scanner This commit implements min/max predicate pushdown for the ORC scanner leveraging on the external ORC library's search arguments. We build the search arguments when we open the scanner as we need not to modify them later. Also added a query option orc_read_statistics. If the option is set to true (it is by default) predicate pushdown will take effect, otherwise it will be skipped. Tests: - Run scanner tests on ORC files. - Run TPCH benchmark, here is no improvement, nor regression. On the other hand, certain selective queries gained significant speed-up. Further TODO: Bump ORC version since predicate pushdown is not yet implemented in the upstream ORC lib (in review). Change-Id: I136622413db21e0941d238ab6aeea901a6464845 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/exprs/scalar-expr.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 10 files changed, 250 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/15403/2 -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 ) Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner .. Patch Set 1: The build should fail since we do not have the version of ORC with search arguments (in fact it's under review there as well). -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Wed, 11 Mar 2020 12:41:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6505: Min-Max predicate push down in ORC scanner
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15403 Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner .. IMPALA-6505: Min-Max predicate push down in ORC scanner This commit implements min/max predicate pushdown for the ORC scanner leveraging on the external ORC library's search arguments. We build the search arguments when we open the scanner as we need not to modify them later. Also added a query option orc_read_statistics. If the option is set to true (it is by default) predicate pushdown will take effect, otherwise it will be skipped. Tests: - Run scanner tests on ORC files. - TODO: test performance - TODO: run test_scanners_fuzz - Bump ORC version since predicate pushdown is not yet implemented in the upstream ORC lib (in review). Change-Id: I136622413db21e0941d238ab6aeea901a6464845 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/exprs/scalar-expr.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 10 files changed, 228 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/15403/1 -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-6360: Don't show full query statement on Impala WebUI by default
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15288 ) Change subject: IMPALA-6360: Don't show full query statement on Impala WebUI by default .. Patch Set 12: Code-Review+1 (1 comment) Hi, left a comment, otherwise LGTM. http://gerrit.cloudera.org:8080/#/c/15288/12/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: http://gerrit.cloudera.org:8080/#/c/15288/12/tests/webserver/test_web_pages.py@423 PS12, Line 423: query I think it would be easier to check this test if you would create it with an expression like: "select \"{0}\"".format("x" * 439) That way you still get a 448 long string, but it's easier to count. -- To view, visit http://gerrit.cloudera.org:8080/15288 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7109a0be5d1022b4f8d6e72441cf5dc1dc42605 Gerrit-Change-Number: 15288 Gerrit-PatchSet: 12 Gerrit-Owner: Adam Tamas Gerrit-Reviewer: Adam Tamas Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Thu, 05 Mar 2020 13:12:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4080 [part 3]: Moved codegen code for TupleRowComparator into a Config object
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15281 ) Change subject: IMPALA-4080 [part 3]: Moved codegen code for TupleRowComparator into a Config object .. Patch Set 5: Code-Review+1 (1 comment) Left a comment, but LGTM. http://gerrit.cloudera.org:8080/#/c/15281/5/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: http://gerrit.cloudera.org:8080/#/c/15281/5/be/src/util/tuple-row-compare.h@74 PS5, Line 74: TSortingOrder::type sorting_order_; nit: you could add a comment here, something like "Indicates the ordering used to instantiate the comparator." And maybe: "Only used with SORT BY queries". -- To view, visit http://gerrit.cloudera.org:8080/15281 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I265604ecbae2c80ff59e697fddfeea4976b25cd9 Gerrit-Change-Number: 15281 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Mar 2020 12:28:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15347 Change subject: IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash .. IMPALA-9447: Fix for TupleRowCompareTest.DecimalTest crash TupleRowCompareTest.DecimalTest crashes on release build because of a faulty test. The problem occures with Decimal16Value when the test tries to copy the values into the tuple in FillMem(). The solution is to use memcpy to avoid gcc generating unaligned instructions like movaps for int128_t. They would raise SegmentFault when addresses are not aligned to 16 bytes. This works with the other types as well. Tests: - Run TupleRowCompareTest. Change-Id: If2a53c88b57cec3a0bfcf02c492fbb560a80682d --- M be/src/util/tuple-row-compare-test.cc 1 file changed, 11 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/15347/1 -- To view, visit http://gerrit.cloudera.org:8080/15347 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If2a53c88b57cec3a0bfcf02c492fbb560a80682d Gerrit-Change-Number: 15347 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 29: Thanks Zoltan, included the header. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Feb 2020 12:16:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,119 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/29 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 29 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@180 PS12, Line 180: >( > I think that >= offsets.size() - 1 is needed, because of the offsets[index Done http://gerrit.cloudera.org:8080/#/c/15051/12/be/src/exec/orc-column-readers.cc@184 PS12, Line 184: src_ptr = blob_ + offsets[index]; : src_len = offsets[index + 1] - offsets[index]; > I think that we cannot trust completely in the length values at the moment Thanks Csaba for investigating, uploaded a PS for the above check, will open ORC jiras for both of your comments. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 27 Feb 2020 11:02:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s3.80s 32% l_commitdate DICTIONARY 5.46s5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%, while there were no regression for the others. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 135 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/14 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Bump ORC version to 1.6.2-p7
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15287 Change subject: IMPALA-9226: Bump ORC version to 1.6.2-p7 .. IMPALA-9226: Bump ORC version to 1.6.2-p7 Bump ORC version to include patch for ORC-600 that unblocks IMPALA-9226. Tests: - Run scanner tests for orc/def/block. Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/15287/1 -- To view, visit http://gerrit.cloudera.org:8080/15287 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65 Gerrit-Change-Number: 15287 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[native-toolchain-CR] IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15265 Change subject: IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain .. IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain This commit adds a bugfix patch that blocks IMPALA-9226. Tests: - Run query_test for orc/def/block locally. - Builds succeeded in all supported platforms. Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457 --- M buildall.sh A source/orc/orc-1.6.2-patches/0007-ORC-600-Fix-StringDictionaryColumnReader-to-update-i.patch 2 files changed, 154 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/65/15265/1 -- To view, visit http://gerrit.cloudera.org:8080/15265 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457 Gerrit-Change-Number: 15265 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-9351: Change path in TestCreateTableLikeFileOrc
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15266 Change subject: IMPALA-9351: Change path in TestCreateTableLikeFileOrc .. IMPALA-9351: Change path in TestCreateTableLikeFileOrc There are some builds where AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to non-existing path. The problem was with the complex table check: /test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0' As Joe suggested, this commit changes the path to another complex table, "complextypestbl", which is loaded by Impala and not written by Hive, so the location should always be the same. Tests: - Run the changed test. Change-Id: I94ac0f09a738e6c3edb068b036c1d50dc9861a7f --- M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/15266/1 -- To view, visit http://gerrit.cloudera.org:8080/15266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I94ac0f09a738e6c3edb068b036c1d50dc9861a7f Gerrit-Change-Number: 15266 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 ) Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h File be/src/common/global-types.h: http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33 PS11, Line 33: UTCPTR > I prefer it all capital for some reason, not sure why :) I'm okay with all capital, too, just thought the style guide requires it in CamelCase. -- To view, visit http://gerrit.cloudera.org:8080/15222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e Gerrit-Change-Number: 15222 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 21 Feb 2020 09:51:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 ) Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix .. Patch Set 11: Code-Review+1 (3 comments) Left some comments, otherwise it LGTM. Will you open a Jira for the postponed changes/TODOs? http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15222/11//COMMIT_MSG@9 PS11, Line 9: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h File be/src/common/global-types.h: http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/common/global-types.h@33 PS11, Line 33: UTCPTR Since it's not a macro anymore, maybe UtcPtr would be a better name? http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/15222/11/be/src/runtime/timestamp-value.h@a35 PS11, Line 35: : Shouldn't we keep this comment? -- To view, visit http://gerrit.cloudera.org:8080/15222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e Gerrit-Change-Number: 15222 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 20 Feb 2020 16:28:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 3: Regarding Csaba's concerns, I checked if Hive supports the same kind of schema evolution for Parquet, since we know that Impala doesn't. Found out that Hive doesn't support it either, only for ORC. -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 10:22:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: The verification failed due to a the flaky test_exchange_mem_usage_scaling and AuthorizationStmtTest.testSelect. Run an exhaustive test, it passed: https://master-02.jenkins.cloudera.com/job/impala-private-parameterized/6472/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Feb 2020 08:55:28 + Gerrit-HasComments: No
[Impala-ASF-CR] Add log of created files for data load
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15234 Change subject: Add log of created files for data load .. Add log of created files for data load As Joe pointed out in IMPALA-9351, it would help debugging issues with missing files if we had logged the created files when loading the data. With this commit, running create-load-data.sh now logs the created files into created-files.log. Change-Id: I4f413810c6202a07c19ad1893088feedd9f7278f --- M testdata/bin/create-load-data.sh 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15234/1 -- To view, visit http://gerrit.cloudera.org:8080/15234 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4f413810c6202a07c19ad1893088feedd9f7278f Gerrit-Change-Number: 15234 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 ) Change subject: WIP: IMPALA-9385: Fix Orc scanner's TimestampValue conversion .. Patch Set 2: (5 comments) Hi, left some minor comments. http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@7 PS2, Line 7: Orc nit: ORC (here and below) http://gerrit.cloudera.org:8080/#/c/15222/2//COMMIT_MSG@36 PS2, Line 36: Further planned changes Is it possible to gain some performance with this change? If yes, adding some measurements would be nice, I think. http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h File be/src/exec/parquet/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/exec/parquet/parquet-common.h@743 PS2, Line 743: is nit: typo http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@150 PS2, Line 150: nullptr nit: maybe it would be worth to mention here as a side note that UTCPTR is in fact nullptr http://gerrit.cloudera.org:8080/#/c/15222/2/be/src/runtime/runtime-state.cc@152 PS2, Line 152: local_time_zone_ = tz == ::GetUtcTimezone() ? UTCPTR : tz; Not related to this point, but as I see, the only other place (outside of some tests) you left GetUtcTimezone() in the code is in ImpalaServer::PrepareQueryContext. Wouldn't it make sense to refactor the code there as welll? -- To view, visit http://gerrit.cloudera.org:8080/15222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e Gerrit-Change-Number: 15222 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 17 Feb 2020 13:06:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 26: The verification failed because the AllTypeTest added too many columns, with more slot size than possible. This resulted in a bitshift overflow when initialising a SlotRef. Added a comment and DCHECK, and removed some not too important columns from the test to prevent this issue from happening. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 26 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Feb 2020 14:25:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9256: Refactor constraint information into a separate class.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15213 ) Change subject: IMPALA-9256: Refactor constraint information into a separate class. .. Patch Set 1: (3 comments) Hi, left some comments. http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/Table.java@117 PS1, Line 117: sql nit: SQL http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@544 PS1, Line 544: primary nit: or foreign keys/SQL constraints http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/15213/1/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@213 PS1, Line 213: return new SqlConstraints(new ArrayList<>(), new ArrayList<>()); Since IMPALA-9158 is merged, can't we return the SQL constraints here with primary and foreign keys? -- To view, visit http://gerrit.cloudera.org:8080/15213 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3f1c441c24df84d2d0791ffe94dff60d039a3341 Gerrit-Change-Number: 15213 Gerrit-PatchSet: 1 Gerrit-Owner: Anurag Mantripragada Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 14 Feb 2020 13:03:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,118 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/25 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 25 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6663: Expose current DDL metrics on WebUI
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/13806 ) Change subject: IMPALA-6663: Expose current DDL metrics on WebUI .. Patch Set 13: Code-Review+1 (2 comments) Left a small nit, but lgtm. http://gerrit.cloudera.org:8080/#/c/13806/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/13806/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@427 PS8, Line 427: case DROP_FUNCTION: > It is possible that I misunderstood the concerns here, but I will try to an I see, I misunderstood something, thanks for clarifying. http://gerrit.cloudera.org:8080/#/c/13806/13/fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java File fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java: http://gerrit.cloudera.org:8080/#/c/13806/13/fe/src/test/java/org/apache/impala/catalog/monitor/CatalogFinalizeDmlCounterTest.java@34 PS13, Line 34: private final String TEST_UPDATE_SQL = "UPDATE table SET c3 = " : + "'test';"; nit: this string could be written without concatenation. -- To view, visit http://gerrit.cloudera.org:8080/13806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a Gerrit-Change-Number: 13806 Gerrit-PatchSet: 13 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Feb 2020 11:41:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types This feature adds support for schema evolution between date and timestamp for the ORC scanner. This means that we can have two tables, one with a date column, another with a timestamp column, and they can both point to the same set of ORC files. The result will be that for the first table everything will be converted to date, and for the second, everything to timestamp. In order to do that, the OrcTimestampReader and OrcDateColumnReader are modified to be able to handle batches of the two types. Their name now represents the destination Impala type. Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column. Tests: * Added type conversion tests. * Tested manually following the use case steps of the Jira. Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e --- M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 6 files changed, 132 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/3 -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@7 PS2, Line 7: between date and timestamp types > Looking at IMPALA-6373 it seems like Impala usually supports these conversi Parquet does not support this kind of compatibility between date and timestamp. I created date and timestamp tables and set them to point to the same location (as described in the Jira), and inserted some values to both tables. When I try to query any of them, I get an error that the schemas of the files are incompatible. I think here we do not wish to follow Parquet and other file formats, but Hive. http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@12 PS2, Line 12: to the same set of O > nit: "to the same set of ORC files" probably expresses the intent better Done http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@121 PS2, Line 121: > nit: Please add comment about what is an invalid value and how it is handle Done http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214 PS2, Line 214: split of an ORC file, an > Does Hive convert in both directions? Yes, it does. http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214 PS2, Line 214: HdfsOrcScanner which only re > What does it mean "ORC support schema evolution"? Done http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@253 PS2, Line 253: : class OrcDateColumnReader : public OrcDateAndTimeStampReader { : public: : OrcDateColumnReader(const orc::Type* node, const SlotDescriptor* slot_desc, : HdfsOrcScanner* scanner) : : OrcDateAndTimeStampReader(node, slot_desc, scanne > nit: complicated and duplicated in OrcTimestampReader. Can you put it into Done -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 04 Feb 2020 10:01:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG@20 PS1, Line 20: > nit: wrap at 72 chars Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h@224 PS1, Line 224: tampReader(const orc::Type* node, const SlotDes > Do we need this, can't we just compare check != nullprt? I just followed the convention of the other UpdateInputBatch functions where the static casted batch is compared to the dynamic casted one. Maybe we could simplify the DCHECKs everywhere? http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@211 PS1, Line 211: if (IsNull(DCHECK_NOTNULL(batch_), row_idx)) { > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@226 PS1, Line 226: } : *slot = DateValue(ts.DaysSinceUnixEpoch()); : } > This is probably not speed critical, but it could be done faster by ignorin Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@229 PS1, Line 229: alid())) { > This will hit a DCHECK if the timestamp is not valid, see https://github.co Done -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Mon, 03 Feb 2020 15:47:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types This feature adds support for schema evolution between date and timestamp for the ORC scanner. This means that we can have two tables, one with a date column, another with a timestamp column, and they can both point to the same ORC file. The result will be that for the first table everything will be converted to date, and for the second, everything to timestamp. In order to do that, the OrcTimestampReader and OrcDateColumnReader are modified to be able to handle batches of the two types. Their name now represents the destination Impala type. Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column. Tests: * Added type conversion tests. * Tested manually following the use case steps of the Jira. Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e --- M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 5 files changed, 125 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/2 -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,128 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s3.80s 32% l_commitdate DICTIONARY 5.46s5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%, while there were no regression for the others. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 135 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/8 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h@212 PS7, Line 212: static_cast(batch_) == : dynamic_cast(orc_batch) > it will be true even if orc_batch is just a StringVectorBatch, because it w Done -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 10:07:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15152 Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types This feature adds support for schema evolution between date and timestamp for the ORC scanner. This means that we can have two tables, one with a date column, another with a timestamp column, and they can both point to the same ORC file. The result will be that for the first table everything will be converted to date, and for the second, everything to timestamp. In order to do that, the OrcTimestampReader and OrcDateColumnReader are modified to be able to handle batches of the two types. Their name now represents the destination Impala type. Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column. Tests: * Added type conversion tests. * Tested manually following the use case steps of the Jira. Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e --- M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 5 files changed, 115 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/1 -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/15051/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15051/6//COMMIT_MSG@35 PS6, Line 35: Also run TPC-H queries on the TPC-H benchmark where some :queries' runtime improved by around 10-15%, > You could also state explicitly whether other queries have regressed or not Done http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683 PS3, Line 683: > It is still used for CollectionValueBuilder. Oh, right. http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@202 PS6, Line 202: batch_ = static_cast(orc_batch); : if (orc_batch == nullptr) return Status::OK(); > Is it ok that batch_ is not set to null if orc_batch is null? Swapped the two lines. http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@210 PS6, Line 210: _- > Does Orc support mixed encoding, e.g. a stripe that has both dictionary and Yes, according to the ORC documentation it cannot change through stripes. I'm not sure if it's worth to add an extra variable just to make sure that this behaviour is preserved by the ORC lib. There are DCHECKs which should also fail in that case (dynamic casting to the Vector batch is based on the encoding). I think a comment to clarify this behaviour would be enough, but please raise your concern if you think otherwise. http://gerrit.cloudera.org:8080/#/c/15051/6/be/src/exec/orc-column-readers.h@211 PS6, Line 211: > "RETURN_IF_ERROR" + "return Status::OK()" could be simplified to "return". Done -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 17:17:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s3.80s 32% l_commitdate DICTIONARY 5.46s5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%, while there were no regression for the others. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 135 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/7 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,127 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/21 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 21 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683 PS3, Line 683: dst_batch->tuple_data_pool() > Thank you, Csaba for the detailed comment. As I see (and as you also suggested offline), we do not need to pass the mempool here, right? -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 12:31:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 6: (5 comments) Thank you all for taking a look, addressed your comments. http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG@31 PS3, Line 31: 39.06s > nit: probably 'a desktop PC', or 'a desktop PC with CPU X' Done http://gerrit.cloudera.org:8080/#/c/15051/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15051/2/be/src/exec/hdfs-orc-scanner.cc@613 PS2, Line 613: > why did you restore the original narrow try-block? I do not call the ORC library anymore in the UpdateInputBatch, therefore the change here is not necessary. http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@625 PS3, Line 625: orc_root_batch_ = row_reader_->createRowBatch(row_batch->capacity()); > nit: this extra new line is not needed Done http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/hdfs-orc-scanner.cc@683 PS3, Line 683: > As we have discussed offline, this is no longer correct. Using the current Thank you, Csaba for the detailed comment. Added mempools for the dictionary and the non-dictionary cases, looks like it works now. http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h@198 PS3, Line 198: HdfsOrcScanne > Looking at the ORC library, it should be possible, since it passes the dict Added mempools for the scanner, now we can reuse the blobs. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 6 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Jan 2020 12:04:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s3.80s 32% l_commitdate DICTIONARY 5.46s5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 137 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/6 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 6 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9324: Correctly handle ORC UNION type in scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15103 ) Change subject: IMPALA-9324: Correctly handle ORC UNION type in scanner .. Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc@203 PS1, Line 203: root_type->toString(), filename()); > Yes, added an error code. I see, thanks for the clarification! http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test File testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test: http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@172 PS1, Line 172: > Yes, they can be ommitted if the expected string is not a regex. Substring Great to know, thanks! I'm okay if you leave it that way, as you find it best. -- To view, visit http://gerrit.cloudera.org:8080/15103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec Gerrit-Change-Number: 15103 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 27 Jan 2020 10:36:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15051/3//COMMIT_MSG@24 PS3, Line 24: Some results: > As discussed offline: These measurement were run with a debug build. Let' s Right, unfortunately when I run the queries in release build, the results are not so bright. There is still a little speedup, but it's far away from the speedup in the debug build. Will upload the exact numbers when I tested it with more runs. http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15051/3/be/src/exec/orc-column-readers.h@198 PS3, Line 198: blob_ = nullptr > So we cannot reuse blobs? Looking at the ORC library, it should be possible, since it passes the dictionary along the way. However, I did not find a simple solution for the same in Impala. After some digging, I found that the memory where the blob is currently allocated is owned by the row batch, and we cannot use this memory with the next row batch. I'll try to check if I keep track of the row batch and reuse the blob between them, will come back to you with the results. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Jan 2020 15:51:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9324: Correctly handle ORC UNION type in scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15103 ) Change subject: IMPALA-9324: Correctly handle ORC UNION type in scanner .. Patch Set 1: Code-Review+1 (3 comments) Left some minor comments, but it LGTM. http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15103/1/be/src/exec/hdfs-orc-scanner.cc@203 PS1, Line 203: "Root of the selected type returned by the ORC lib is not STRUCT: $0. " nit: this could be unified with orc-metadata-utils.cc:28, would it make sense to add an error code for this case as well? Or are both checks necessary? http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test File testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test: http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@160 PS1, Line 160: Note that before : # IMPALA-9324 this query will crash. nit: I think this comment here is not necessary. http://gerrit.cloudera.org:8080/#/c/15103/1/testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test@172 PS1, Line 172: has an incompatible ORC schema for column '$DATABASE.ill_complextypes.u', Just a question: Isn't the fie is also part of the error msg? Can they be omitted? -- To view, visit http://gerrit.cloudera.org:8080/15103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec Gerrit-Change-Number: 15103 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Fri, 24 Jan 2020 10:50:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 4: Rebased. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 4 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 24 Jan 2020 10:10:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 97s 30s 3.23 l_shipinstruct DICTIONARY 86s 18s 4.77 l_commitdate DICTIONARY 85s 20s 4.25 The queries were run on my machine with MT_DOP and NUM_NODES set to 1. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 3 files changed, 56 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/4 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 4 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,109 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/20 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 20: (5 comments) http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@319 PS19, Line 319: // The algorithm requires all values having a common type, without loss of data. : // This means we have to find the biggest type. : int max_size = ordering_exprs_[0]->type().GetByteSize(); : for (int i = 1; i < ordering_exprs_.size(); ++i) { > nit: the mask could be calculated in GetSharedRepresentation() instead of p Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@423 PS19, Line 423: > Local variable U val shadows patameter void* val. Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@424 PS19, Line 424: > nit: please add comment about it, something like "we copy the bytes from th Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@434 PS19, Line 434: alue = *reinterpret_cast(val); > It will only have the value of the first char of the string. Done http://gerrit.cloudera.org:8080/#/c/14080/19/be/src/util/tuple-row-compare.cc@435 PS19, Line 435: tmp, _value, sizeof(T)); > replace with 'sizeof(U) - std::min(sizeof(U), type.len)'? Done -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 20 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Jan 2020 17:06:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15050 ) Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release() .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@175 PS5, Line 175: if (UNLIKELY(bytes == 0)) return true; > The above DCHECK allows the value 0 and it should return success. We need t Right, for some reasons I thought it's a DCHECK_LE, apologies! -- To view, visit http://gerrit.cloudera.org:8080/15050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5 Gerrit-Change-Number: 15050 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Jan 2020 14:20:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15050 ) Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release() .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@175 PS5, Line 175: if (UNLIKELY(bytes == 0)) return true; Shouldn't it be handled like at Release? Since there is a DCHECK for this above, we should not even reach this line. You also say in the commit message that the consumption is not updated when the bytes are zero. http://gerrit.cloudera.org:8080/#/c/15050/5/be/src/runtime/mem-tracker.h@218 PS5, Line 218: < 0 nit: did you mean <= ? -- To view, visit http://gerrit.cloudera.org:8080/15050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5 Gerrit-Change-Number: 15050 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Jan 2020 13:57:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 97s 30s 3.23 l_shipinstruct DICTIONARY 86s 18s 4.77 l_commitdate DICTIONARY 85s 20s 4.25 The queries were run on my machine with MT_DOP and NUM_NODES set to 1. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 3 files changed, 57 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/3 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[native-toolchain-CR] IMPALA-6772: Add ORC 1.6.2 to toolchain
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15073 ) Change subject: IMPALA-6772: Add ORC 1.6.2 to toolchain .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh File buildall.sh: http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh@347 PS3, Line 347: export SNAPPY_VERSION=1.1.4 > IIUC, we use these versions in our native-toolchain to have consistent beha I see, thanks for clarifying! -- To view, visit http://gerrit.cloudera.org:8080/15073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62734f791fd8fc098579f488a869a2153ebad919 Gerrit-Change-Number: 15073 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 14:53:06 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-6772: Add ORC 1.6.2 to toolchain
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15073 ) Change subject: IMPALA-6772: Add ORC 1.6.2 to toolchain .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh File buildall.sh: http://gerrit.cloudera.org:8080/#/c/15073/3/buildall.sh@347 PS3, Line 347: export SNAPPY_VERSION=1.1.4 Shouldn't these versions get updated to match the ones in the ORC lib? https://github.com/apache/orc/blob/master/cmake_modules/ThirdpartyToolchain.cmake#L14 -- To view, visit http://gerrit.cloudera.org:8080/15073 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62734f791fd8fc098579f488a869a2153ebad919 Gerrit-Change-Number: 15073 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 14:29:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/19 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 19 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@34 PS14, Line 34: getting great results > Could you provide some basic statistics? Done http://gerrit.cloudera.org:8080/#/c/14080/14//COMMIT_MSG@35 PS14, Line 35: One negative is the sorting itself, taking :4-7 more times than lexical sorting. > You could emphasize that it only affects the writes. Done -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 10:52:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% A only drawback is the sorting itself, taking ~4 times more than lexical sorting. Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/18 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 18 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,062 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/16 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 16 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 16: Rebased. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 16 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 21 Jan 2020 09:24:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 2: As Zoltán suggested, I tried to add an optimisation, where I would not have to copy the blob in each batch, but only for stripes. I have not managed to figure it out, looks like the memory is freed between the batches. I left this part of the code in the commit in a comment for now, will try to solve it next week. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 17 Jan 2020 17:21:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15051 Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. WIP: IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be aquired with the given indices and lengths. Tests: * TODO: performance tests. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 3 files changed, 118 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/1 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be aquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after = l_comment DIRECT 14.05s (97s) 4.11s (30s) l_shipinstruct DICTIONARY 12.36s (86s) 2.16s (18s) l_commitdateDICTIONARY 12.56s (85s) 2.54s (20s) The queries are run on my machine with 8 core, while in the brackets the durations are when MT_DOP and NUM_NODES are set to 1. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 3 files changed, 95 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/2 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 ) Change subject: IMPALA-8801: Date type support for ORC scanner .. Patch Set 8: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/14982/8/tests/query_test/test_scanners.py@1302 PS8, Line 1302: an nit: the comments could be updated a bit, since not only one illtypes table is created -- To view, visit http://gerrit.cloudera.org:8080/14982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956 Gerrit-Change-Number: 14982 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 17 Jan 2020 12:26:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release()
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15050 ) Change subject: IMPALA-9291: MemTracker's TryConsume() should behave consistent with Consume()/Release() .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22e613c49e4a6bd218e7f7f7c0d6bec95b75cff5 Gerrit-Change-Number: 15050 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Jan 2020 15:17:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. Patch Set 11: Rebased. -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 11 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 10 Jan 2020 12:48:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files The change relies on the ORC Java API with version 1.5 or greater, because of a bug in earlier versions. Therefore, ORC is listed as an external dependency, instead of relying on Hive's ORC version (from Hive3, Hive also lists it as a dependency). Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemaExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M impala-parent/pom.xml M shaded-deps/pom.xml A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 15 files changed, 495 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/11 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 11 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9277: Catch exception thrown from orc::ColumnSelector::updateSelectedByTypeId
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14994 ) Change subject: IMPALA-9277: Catch exception thrown from orc::ColumnSelector::updateSelectedByTypeId .. Patch Set 1: Code-Review+1 Lgtm, thanks for fixing! -- To view, visit http://gerrit.cloudera.org:8080/14994 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f706bc832298cb5089e539b7a818cb86d02199f Gerrit-Change-Number: 14994 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 09 Jan 2020 18:13:09 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP]IMPALA-8778: Support Apache Hudi Read Optimized Table
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14711 ) Change subject: [WIP]IMPALA-8778: Support Apache Hudi Read Optimized Table .. Patch Set 7: (3 comments) Hi! Thanks for working on this, left some comments there. http://gerrit.cloudera.org:8080/#/c/14711/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14711/7//COMMIT_MSG@13 PS7, Line 13: Could you please list the tests you did here? http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@88 PS7, Line 88: fileFormat Could you update the comments above? http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/14711/7/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@165 PS7, Line 165: // HUDI_PARQUET doesn't support converting from thrift Maybe it would be better to add an empty HUDI_PARQUET case over the default case, just to have it listed here. -- To view, visit http://gerrit.cloudera.org:8080/14711 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf Gerrit-Change-Number: 14711 Gerrit-PatchSet: 7 Gerrit-Owner: Yanjia Gary Li Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Yanjia Gary Li Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 09 Jan 2020 17:18:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 ) Change subject: IMPALA-8801: Date type support for ORC scanner .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/14982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956 Gerrit-Change-Number: 14982 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jan 2020 13:02:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9010: Add builtin mask functions
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14963 ) Change subject: IMPALA-9010: Add builtin mask functions .. Patch Set 2: Code-Review+1 (4 comments) That seems a lot of work, nice job! Added some small comments, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc@10449 PS2, Line 10449: // Error handling What happens when one would mask the day in 2019-02-02 to 30? Could you add a test for it here? http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc File be/src/exprs/mask-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@223 PS2, Line 223: !(1 <= day_value && day_value <= 31) This considers 31 as a valid day number for eg. February. Shouldn't this be checked here? What is the behaviour of Hive? http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@266 PS2, Line 266: 4 Wouldn't it be nicer if this constant were defined at the beginning of the file with the other constants? It appears here and many times below. http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@697 PS2, Line 697: (void)SHA256(val.ptr, val.len, sha256_hash.ptr); nit: Wouldn't using "discard_result" be nicer here? -- To view, visit http://gerrit.cloudera.org:8080/14963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664 Gerrit-Change-Number: 14963 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 07 Jan 2020 15:52:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8801: Date type support for ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14982 ) Change subject: IMPALA-8801: Date type support for ORC scanner .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@224 PS2, Line 224: public: nit: space is missing? http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.h@235 PS2, Line 235: private: nit: space is missing? (see other classes below) http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@101 PS2, Line 101: case TYPE_DATE: { nit: the braces can be omitted http://gerrit.cloudera.org:8080/#/c/14982/2/be/src/exec/orc-column-readers.cc@213 PS2, Line 213: !dv.IsValid() I think you could add UNLIKELY here. http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/14982/2/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@30 PS2, Line 30: Supported HDFS file formats. Every file format specifies: Not related to this commit specifically, but would be good to have this comment updated with the sixth parameter's description. -- To view, visit http://gerrit.cloudera.org:8080/14982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956 Gerrit-Change-Number: 14982 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Tue, 07 Jan 2020 13:12:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/14811/8/fe/pom.xml File fe/pom.xml: http://gerrit.cloudera.org:8080/#/c/14811/8/fe/pom.xml@309 PS8, Line 309: org.apache.orc > I think we need to add ORC stuffs in shaded-deps/pom.xml so we don't use or Done http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java File fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java: http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java@181 PS8, Line 181:* parsing the ORC schema, or if the ORC types cannot be represented in Impala. > Change Parquet to ORC Done http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java File fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java: http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@28 PS8, Line 28: ORCSchemeExtractor and ParquetScheme > nit: OrcSchemaExtractor and ParquetSchemaExtractor Done http://gerrit.cloudera.org:8080/#/c/14811/8/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@35 PS8, Line 35: public static void CheckIfFile(Path pathToFile) throws AnalysisException { > What about merging this into FileSystemUtil.java? Other utilities for files Thought about that, but this file contains Analysis specific utilities and as I saw FileSystemUtil.java has nothing of the sort, so I did not want to mingle them. -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 10 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 Dec 2019 09:00:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files The change relies on the ORC Java API with version 1.5 or greater, because of a bug in earlier versions. Therefore, ORC is listed as an external dependency, instead of relying on Hive's ORC version (from Hive3, Hive also lists it as a dependency). Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemaExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M impala-parent/pom.xml M shaded-deps/pom.xml A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 15 files changed, 495 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/10 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 10 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] Fix bug in report benchmark results.py
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14925 ) Change subject: Fix bug in report_benchmark_results.py .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/14925 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a163e74fefc896464e35a6a1b91ce57de592f1a Gerrit-Change-Number: 14925 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Thu, 19 Dec 2019 08:35:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,060 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/13 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 13 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. One negative is the sorting itself, taking 4-7 more times than lexical sorting. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,059 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/14 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 14 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9207: [DOCS] Documented the #Inst in exec summary
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14860 ) Change subject: IMPALA-9207: [DOCS] Documented the #Inst in exec summary .. Patch Set 2: Code-Review+1 (2 comments) Found some more nits, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_explain_plan.xml File docs/topics/impala_explain_plan.xml: http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_explain_plan.xml@175 PS2, Line 175: nit: extra space http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_mem_limit.xml File docs/topics/impala_mem_limit.xml: http://gerrit.cloudera.org:8080/#/c/14860/2/docs/topics/impala_mem_limit.xml@209 PS2, Line 209: #Hosts nit: #Inst -- To view, visit http://gerrit.cloudera.org:8080/14860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I938930c66144ba6bce766981d363abe4b28ba524 Gerrit-Change-Number: 14860 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Sun, 08 Dec 2019 13:57:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9207: [DOCS] Documented the #Inst in exec summary
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14860 ) Change subject: IMPALA-9207: [DOCS] Documented the #Inst in exec summary .. Patch Set 1: (2 comments) Found a few more occurrences of a summary query when I grepped on "#Hosts": * docs/topics/impala_mem_limit.xml * docs/topics/impala_live_summary.xml should also be updated. http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml File docs/topics/impala_explain_plan.xml: http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml@147 PS1, Line 147: When the MT_DOP query option is set to a value larger than : 0, #Inst column in the output shows the number of : fragment instances. Impala decomposes each query into smaller units of work that are : distributed across the cluster, and these units are referred as fragments. The #Inst column in the output will be also shown when MT_DOP=0, with the same value as #Hosts, since there will be exactly one fragment for each host. (Might be good to clarify, here or at the example.) http://gerrit.cloudera.org:8080/#/c/14860/1/docs/topics/impala_explain_plan.xml@168 PS1, Line 168: 0 This and the zeros below should be ones. -- To view, visit http://gerrit.cloudera.org:8080/14860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I938930c66144ba6bce766981d363abe4b28ba524 Gerrit-Change-Number: 14860 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Fri, 06 Dec 2019 12:23:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. Patch Set 8: (16 comments) http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@74 PS7, Line 74: Schema > nit: Isn't this meant to be Schema Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@77 PS7, Line 77: Schema > nit: same as above Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java File fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java: http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@49 PS7, Line 49: > nit: Schema Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@49 PS7, Line 49: > nit: use camel case for acronyms, e.g. OrcSchemaExtractor Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@50 PS7, Line 50: > I haven't seen any tests for this error msg. Did I miss something? Tried to create an ORC file with a non-primitive map key, but did not find an easy way to do so with the available tools. http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@74 PS7, Line 74: > nit: Orc Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@75 PS7, Line 75: > Can it really throw AnalysisException? Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@139 PS7, Line 139: > Might worth a DCHECK here as well that the size of fieldNames equals to the Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@148 PS7, Line 148: > nit: Orc Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@170 PS7, Line 170: > nit: Orc Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java File fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java: http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java@50 PS7, Line 50: > nit: Schema Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029 PS7, Line 2029: > To be in line with TestCreateTableLikeFile() could you rename this to TestC Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029 PS7, Line 2029: > I'd also add the "file does't exist" coverage to this function. Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2029 PS7, Line 2029: @Test > nit: put @Test into a separate line Done http://gerrit.cloudera.org:8080/#/c/14811/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2030 PS7, Line 2030: leLikeFileOrc() throws AnalysisException { > I though Apache Impala has ORC support regardless of the Hive version. I ca Right, this comment is misleading. What I meant to write was that the Java API that comes with the CDH Hive ORC fails getting the schema of the ORC file, because of a bug in ORC. This is solved however by listing ORC as an external dependency (like Hive does from version 3), instead of relying on Hive's ORC version. (The bug is fixed in later versions.) http://gerrit.cloudera.org:8080/#/c/14811/7/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test: http://gerrit.cloudera.org:8080/#/c/14811/7/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test@254 PS7, Line 254: --- QUERY > Why was this parquet test necessary for this ORC related patch? This test is not related to ORC support, was only at the wrong place (in create-table-like-file.test) -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files The change relies on the ORC Java API with version 1.5 or greater, because of a bug in earlier versions. Therefore, ORC is listed as an external dependency, instead of relying on Hive's ORC version (from Hive3, Hive also lists it as a dependency). Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemaExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/OrcSchemaExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemaExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M impala-parent/pom.xml A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 14 files changed, 460 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/8 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9217: Adjust limits for TZH and TZM datetime tokens
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14848 ) Change subject: IMPALA-9217: Adjust limits for TZH and TZM datetime tokens .. Patch Set 1: Code-Review+1 Had a look at other tests in the file, as I see they did not use invalid values. Lgtm. -- To view, visit http://gerrit.cloudera.org:8080/14848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6fe2534d51396bb5652af6301866e2dd0f3282c2 Gerrit-Change-Number: 14848 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Thu, 05 Dec 2019 12:33:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. Patch Set 2: Code-Review+1 (4 comments) Found some nits, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443 PS2, Line 443: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456 PS2, Line 456: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test File testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6 PS2, Line 6: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16 PS2, Line 16: Orc nit: same here -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 10:54:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9215: report benchmark results.py fails with missing key
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14830 ) Change subject: IMPALA-9215: report_benchmark_results.py fails with missing key .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/14830 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I822c86f621f5a348b56d672c263a2cf9321767ee Gerrit-Change-Number: 14830 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Thu, 05 Dec 2019 08:13:56 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file .. WIP: IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files. The change relies on ORC 1.5, therefore can only be used with USE_CDP_HIVE=true. Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemeExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. TODO: these tests might fail Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 11 files changed, 435 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/7 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file .. WIP: IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files. The change relies on ORC 1.5, therefore can only be used with USE_CDP_HIVE=true. Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemeExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. TODO: these tests might fail Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 11 files changed, 435 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/6 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 6 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file .. IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files. The change relies on ORC 1.5, therefore can only be used with USE_CDP_HIVE=true. Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemeExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 11 files changed, 418 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/4 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 4 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] WIP: IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/14811 ) Change subject: WIP: IMPALA-8046: Support CREATE TABLE from an ORC file .. WIP: IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files. The change relies on ORC 1.5, therefore can only be used with USE_CDP_HIVE=true. Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemeExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 11 files changed, 418 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/5 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 5 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/14811/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14811/3//COMMIT_MSG@19 PS3, Line 19: CREATE TABLE tablename LIKE ORC '/path/to/file' > CREATE TABLE LIKE without file format name always means Parquet? It would b CREATE TABLE LIKE 'path/to/file' is not a valid query and the file format has to be added explicitly, there is no default value. http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java File fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java: http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java@80 PS3, Line 80: case BYTE: return Type.TINYINT; > I saw different type names in https://orc.apache.org/docs/types.html, e.g. In the documentation the ORC types are listed under the enum TypeDescription.Category, where these type names were present. It looks like that in the Java API the Java type equivalents were used. https://orc.apache.org/api/orc-core/org/apache/orc/TypeDescription.Category.html http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java File fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java: http://gerrit.cloudera.org:8080/#/c/14811/3/fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java@1 PS3, Line 1: package org.apache.impala.util; > Missing apache license (this also led to the build failure) Done http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test: http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@58 PS3, Line 58: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test@74 PS3, Line 74: : QUERY : drop table allcomplextypes_clone_orc : RESULTS : 'Table has been dropped.' > Why is this needed? Other tables are not dropped + it is done automatically Done http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test: http://gerrit.cloudera.org:8080/#/c/14811/3/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test@255 PS3, Line 255: drop table if exists allcomplextypes_clone > We shouldn't need this - test_create_table_like_table() uses a unique_datab Done -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 29 Nov 2019 15:12:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14811 ) Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. Patch Set 3: The e2e tests did not run yet, because of the ORC dependency problem mentioned in the commit message (sorry for not mentioning before). -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 29 Nov 2019 14:53:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8046: Support CREATE TABLE from an ORC file
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14811 Change subject: IMPALA-8046: Support CREATE TABLE from an ORC file .. IMPALA-8046: Support CREATE TABLE from an ORC file Impala supports creating a table using the schema of a file. However, only Parquet is supported currently. This commit adds support for creating tables from ORC files. The change relies on ORC 1.5, therefore can only be used with USE_CDP_HIVE=true. Also, the commit performs a little clean-up on the ParquetHelper class, renaming it to ParquetSchemeExtractor and removing outdated comments. To create a table from an ORC file, run: CREATE TABLE tablename LIKE ORC '/path/to/file' Tests: * Added analysis tests for primitive and complex types. * Added e2e tests for creating tables from ORC files. Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d --- M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java A fe/src/main/java/org/apache/impala/analysis/ORCSchemeExtractor.java R fe/src/main/java/org/apache/impala/analysis/ParquetSchemeExtractor.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java A fe/src/main/java/org/apache/impala/util/FileAnalysisUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M tests/common/skip.py M tests/metadata/test_ddl.py 11 files changed, 417 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/14811/3 -- To view, visit http://gerrit.cloudera.org:8080/14811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I77cd84cda2ed86516937a67eb320fd41e3f1cf2d Gerrit-Change-Number: 14811 Gerrit-PatchSet: 3 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy