[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/15057/9/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/15057/9/bin/bootstrap_system.sh@315 PS9, Line 315: 500 > nit: 1500 Not 1500, but 500. 's/\(max_connections = \)\S*/\1500/g' this usage means: use '\1' to replace 'max_connections = ', '500' is the actual content which you want to update. http://gerrit.cloudera.org:8080/#/c/15057/9/tests/custom_cluster/test_kudu_table_create_without_hms.py File tests/custom_cluster/test_kudu_table_create_without_hms.py: http://gerrit.cloudera.org:8080/#/c/15057/9/tests/custom_cluster/test_kudu_table_create_without_hms.py@32 PS9, Line 32: @pytest.mark.execute_serially > This should be removed, otherwise the test is skipped when using Hive3. Done -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 04 Feb 2020 04:07:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
wangsheng has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 When build impala after setting USE_CDP_HIVE=true, the custom cluster test case test_kudu_table_create_without_hms would failed due to lacking of related jars. The solution is to add related maven dependency in $IMPALA_HOME/fe/pom.xml and $IMPALA_HOME/shaded-deps/pom.xml. Tests: * Ran test_kudu_table_create_without_hms.py by setting USE_CDP_HIVE=true locally Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 --- M bin/bootstrap_system.sh M fe/pom.xml M shaded-deps/pom.xml M tests/custom_cluster/test_kudu_table_create_without_hms.py 4 files changed, 31 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/15057/10 -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/15057/9/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/15057/9/bin/bootstrap_system.sh@315 PS9, Line 315: 500 nit: 1500 http://gerrit.cloudera.org:8080/#/c/15057/9/tests/custom_cluster/test_kudu_table_create_without_hms.py File tests/custom_cluster/test_kudu_table_create_without_hms.py: http://gerrit.cloudera.org:8080/#/c/15057/9/tests/custom_cluster/test_kudu_table_create_without_hms.py@32 PS9, Line 32: @SkipIfHive3.without_hms_not_supported This should be removed, otherwise the test is skipped when using Hive3. -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 04 Feb 2020 04:00:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. Patch Set 9: > (1 comment) Soory for my late reply, Quanlong. I've already ran this test case before submit patch, and result is passed. I checked my test environment, found that pg max_connections been changed to 1000(default is 100). So I modify max_connections in $IMPALA_HOME/bin/bootstrap_system.sh. You can test the latest code in your environment when you are free. Thanks for your review again. -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 04 Feb 2020 03:56:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8361: Propagate predicates of outer-joined InlineView
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15047 ) Change subject: IMPALA-8361: Propagate predicates of outer-joined InlineView .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/15047/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15047/9//COMMIT_MSG@24 PS9, Line 24: , nit: need space http://gerrit.cloudera.org:8080/#/c/15047/9//COMMIT_MSG@24 PS9, Line 24: the predicates that : must be evaluted at a join node but can also be safely evaluted by the : outer-joined inline view. I think you mean "some predicates that must be evaluted at a join node can also be safely evaluted by the outer-joined inline view". http://gerrit.cloudera.org:8080/#/c/15047/9//COMMIT_MSG@30 PS9, Line 30: . nit: need space http://gerrit.cloudera.org:8080/#/c/15047/9/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/15047/9/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@985 PS9, Line 985: picked up by getBoundPredicates() I think this comment is stale after merging this patch. Could you update it to "picked up by getBoundPredicates() and migrateConjunctsToInlineView()"? http://gerrit.cloudera.org:8080/#/c/15047/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test: http://gerrit.cloudera.org:8080/#/c/15047/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@2368 PS9, Line 2368: predicates: rand() = 12 This looks strange to me. rand() returns a random value between 0 and 1 so "rand() = 12" will always be false. All rows should be rejected by the WHERE clause. If "rand() = 12" is evaluated in only one side, the other side can still produce rows. So the outer join will still have results. However, looks like the original planner has the same plan. Could you create a JIRA for this? I think it's a bug. It's worth to mention it in the above comments. http://gerrit.cloudera.org:8080/#/c/15047/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@2402 PS9, Line 2402: , nit: put the space after the comma http://gerrit.cloudera.org:8080/#/c/15047/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@2442 PS9, Line 2442: upper(b.string_col) Should we move the function inside the view? This can be propagated without this patch. Maybe change it to SELECT * FROM functional.alltypestiny a LEFT JOIN (SELECT upper(b.string_col) as string_col, b.id FROM functional.alltypestiny a LEFT JOIN functional.alltypestiny b ON a.id=b.id) b ON a.id=b.id WHERE b.string_col='1'; -- To view, visit http://gerrit.cloudera.org:8080/15047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6c23a45aeb5dd1aa06a95c9aa8628ecbe37ef2c1 Gerrit-Change-Number: 15047 Gerrit-PatchSet: 9 Gerrit-Owner: Xianqing He Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Xianqing He Gerrit-Comment-Date: Tue, 04 Feb 2020 03:49:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9174: Emit WARNING when ORC lib leaks memory
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15153 ) Change subject: IMPALA-9174: Emit WARNING when ORC lib leaks memory .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I370fb9f68734e0e555bd7224ab0f5440c4947c66 Gerrit-Change-Number: 15153 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 04 Feb 2020 02:37:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Tue, 04 Feb 2020 02:26:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 ImpalaShell.test_config_file failed in negative test case, which ran impala shell with bad format config file - wrong option name and wrong option value. The testing code expect impala shell return both warning and error messages. But on CentOS6/Python 2.6, Impala shell only return error message. To fix it, separate the test cases as two test cases by running Impala shell in two different config file. Testing: - Passed all test cases in test_shell_commandline.py and test_shell_interactive.py. - Passed all core test in pre-review-test. - Passed EE tests in impala-private-parameterized with CentOS6. Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Reviewed-on: http://gerrit.cloudera.org:8080/15139 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M bin/rat_exclude_files.txt M tests/shell/impalarc_with_error2 A tests/shell/impalarc_with_warnings2 M tests/shell/test_shell_commandline.py 4 files changed, 19 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 7 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 04 Feb 2020 00:55:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15149 ) Change subject: IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala .. Patch Set 3: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/569/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/15149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic07380fd53898dd21fbb5dacb4d9f7a84f160d4e Gerrit-Change-Number: 15149 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 04 Feb 2020 00:29:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15149 ) Change subject: IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala .. Patch Set 3: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/569/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/15149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic07380fd53898dd21fbb5dacb4d9f7a84f160d4e Gerrit-Change-Number: 15149 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 04 Feb 2020 00:24:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala
Hello Vihang Karajgaonkar, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15149 to look at the new patch set (#3). Change subject: IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala .. IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala Summary of changes: - Changed title of "Kudu tables:" paragraph to "Managed Kudu tables:". - Added syntax block in "External Kudu tables" to show new alternative create table syntax. - Described alternative syntax and the differences between the two resulting tables. - In Kudu considerations section, added an example of creating a synchronized external Kudu table. - Covered similarily of synchronized tables to managed tables and HMS translation of external tables. Change-Id: Ic07380fd53898dd21fbb5dacb4d9f7a84f160d4e --- M docs/topics/impala_create_table.xml 1 file changed, 59 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/15149/3 -- To view, visit http://gerrit.cloudera.org:8080/15149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic07380fd53898dd21fbb5dacb4d9f7a84f160d4e Gerrit-Change-Number: 15149 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns in masked tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns in masked tables .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5598/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 04 Feb 2020 00:07:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5495/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 23:38:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 23:38:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 23:28:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns in masked tables
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns in masked tables .. Patch Set 9: Code-Review+2 Carry on Csaba's +2. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 23:21:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns in masked tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns in masked tables .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5494/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 23:22:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns in masked tables
Hello Anurag Mantripragada, Fang-Yu Rao, Vihang Karajgaonkar, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15108 to look at the new patch set (#9). Change subject: IMPALA-9330: Support resolving unmasked nested columns in masked tables .. IMPALA-9330: Support resolving unmasked nested columns in masked tables Column masking policies on primitive columns of a table which contains nested types (though they won't be masked) will cause query failures. To be specifit, if tableA(id int, int_array array) has a masking policy on column "id", all queries on "tableA" will fail, e.g. select id from tableA; select t.id, a.item from tableA t, t.int_array a; Column masking is implemented by wrapping the underlying table/view with a table masking view. However, as we don't support nested types in SelectList, the table masking view can't expose nested columns of the masked table, which causes collection refs not being resolved correctly. This patch fixes the issue by 2 steps: 1) Expose nested columns of the underlying table in the output Type of the table masking view (see InlineViewRef#createTupleDescriptor()). So nested Paths in the original query block can be resolved. 2) For such kind of Paths, resolved them again inside the table masking view. So they can point to the underlying table as what they mean (see Analyzer#resolvePathWithMasking()). TupleDescriptor of such kind of table masking view won't be materialized since the view is simple enough that its query plan is just a ScanNode of the underlying table. The whole query plan can be stitched as if the table is not masked. Note that one day when we support nested columns in SelectList, we may don't need these 2 hacks. This patch also adds some TRACE level loggings to improve debuggability. Test changes in TestRanger.test_column_masking: - Add column masking policy on a table containing nested types. - Add queries on the masked tables. Some queries are borrowed from existing tests for nested types. Tests: - Run CORE tests. Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/authorization/TableMask.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/authorization/test_ranger.py 12 files changed, 527 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/15108/9 -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns in masked tables
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns in masked tables .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/15108/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15108/8//COMMIT_MSG@7 PS8, Line 7: IMPALA-9330: Support resolving unmasked nested columns in masked tables > nit: maybe add something like "in masked tables"? Done http://gerrit.cloudera.org:8080/#/c/15108/8/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/15108/8/tests/authorization/test_ranger.py@806 PS8, Line 806: > I would prefer to give an error, but I am ok with the current status. I filed HIVE-22822 and HIVE-22823 for Hive. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 23:21:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7002: Throw AuthorizationException when user accessing non-existent table/database in CTE without any privilege.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15123 ) Change subject: IMPALA-7002: Throw AuthorizationException when user accessing non-existent table/database in CTE without any privilege. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/15123/4/fe/src/main/java/org/apache/impala/analysis/WithClause.java File fe/src/main/java/org/apache/impala/analysis/WithClause.java: http://gerrit.cloudera.org:8080/#/c/15123/4/fe/src/main/java/org/apache/impala/analysis/WithClause.java@94 PS4, Line 94: } catch (AnalysisException e) { : throw e; nit: we dont need the catch block if you are only returning the exception here http://gerrit.cloudera.org:8080/#/c/15123/4/fe/src/main/java/org/apache/impala/analysis/WithClause.java@98 PS4, Line 98: // withClauseAnalyzer is local variable. The privilege requests registered : // on it have to be re-registered to the root analyzer even when analyze : // function throw AnalysisException since authorization check is required : // for non existent database/table. nit: this doesn't really tell me why they need to be re-registered. Also, you used a specific example of when we need auth checks, is that the only case where exception can be thrown? If not, it might be worthwhile to investigate if it is ok to register auth checks in those other cases. If you end up concluding that auth checks need to be registered in all cases then the method comment should be generic to encompass all cases. -- To view, visit http://gerrit.cloudera.org:8080/15123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6b657a7147a136198a9a97a679c9131ee814577 Gerrit-Change-Number: 15123 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Feb 2020 23:08:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8712: Make ExecQueryFInstances async
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15154 ) Change subject: IMPALA-8712: Make ExecQueryFInstances async .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5597/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 Gerrit-Change-Number: 15154 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 03 Feb 2020 23:06:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9335 (part 2): Fix rebased KRPC to compile
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15144 ) Change subject: IMPALA-9335 (part 2): Fix rebased KRPC to compile .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc@558 PS2, Line 558: handle.get() > Nit: you don't need the .get() I know. I prefer doing it this way because it makes it more obvious to readers of the code that its a unique_ptr, but I don't feel strongly about it and can remove it if you prefer. -- To view, visit http://gerrit.cloudera.org:8080/15144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb Gerrit-Change-Number: 15144 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 03 Feb 2020 22:57:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py File tests/custom_cluster/test_concurrent_kudu_create.py: http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@a47 PS1, Line 47: > I am not very familiar with the usage of tls.client. Is it true that we do Replied this at line65. http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@51 PS1, Line 51: pool = ThreadPool(processes=3) > Just like to check whether or not my understanding is correct. The reason t Yes, we can reuse the threads and therefore reuse the client (connection) in the thread. http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@65 PS1, Line 65: pool > I looked at other usages of client and it seems that we don't need to expli Yes, when the threads are terminated, their connections are closed so the associated sessions are closed too. I verify these in impalad.INFO. By the way, since this is a custom cluster test, the cluster will be restarted after the test. So we don't need to be afraid of potential session leaks affecting other tests. -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 22:56:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9335 (part 2): Fix rebased KRPC to compile
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15144 ) Change subject: IMPALA-9335 (part 2): Fix rebased KRPC to compile .. Patch Set 2: Code-Review+1 (3 comments) This is looking good to me. I compared the changes in this patch to the existing code in our KRPC. http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc@558 PS2, Line 558: handle.get() Nit: you don't need the .get() http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc@616 PS2, Line 616: .get() Nit: Don't need the .get() http://gerrit.cloudera.org:8080/#/c/15144/2/be/src/runtime/io/data-cache.cc@649 PS2, Line 649: .get() Nit: don't need the .get() -- To view, visit http://gerrit.cloudera.org:8080/15144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb Gerrit-Change-Number: 15144 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 03 Feb 2020 22:44:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8712: Make ExecQueryFInstances async
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15154 ) Change subject: IMPALA-8712: Make ExecQueryFInstances async .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/15154/1/tests/custom_cluster/test_rpc_exception.py File tests/custom_cluster/test_rpc_exception.py: http://gerrit.cloudera.org:8080/#/c/15154/1/tests/custom_cluster/test_rpc_exception.py@27 PS1, Line 27: def get_rpc_debug_action(rpc, action, port=KRPC_PORT): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/15154/1/tests/custom_cluster/test_rpc_exception.py@33 PS1, Line 33: def get_fail_action(rpc, error=None, port=KRPC_PORT, p=0.1): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/15154/1/tests/custom_cluster/test_rpc_exception.py@147 PS1, Line 147: r flake8: F841 local variable 'result' is assigned to but never used -- To view, visit http://gerrit.cloudera.org:8080/15154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 Gerrit-Change-Number: 15154 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 03 Feb 2020 22:21:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8712: Make ExecQueryFInstances async
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15154 Change subject: IMPALA-8712: Make ExecQueryFInstances async .. IMPALA-8712: Make ExecQueryFInstances async This patch refactors the ExecQueryFInstances rpc to be asychronous. Previously, Impala would issue all the Exec()s, wait for all of them to complete, and then check if any of them resulted in an error. We now stop issuing Exec()s and cancel any that are still in flight as soon as an error occurs. It also performs some cleanup around the thread safety of Coordinator::BackendState, including adding comments and DCHECKS. === Exec RPC Thread Pool === This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This thread pool was used to partially simulate async Exec() prior to the switch to KRPC, which provides built-in async rpc capabilities. Removing this thread pool has potential performance implications, as it means that the Exec() parameters are serialized in serialize rather than in parallel (with the level of parallelism determined by the size of the thread pool, which was configurable by an Advanced flag and defaulted to 12). To ensure we don't regress query startup times, I did some performance testing. All tests were done on a 10 node cluster. The baseline used for the tests did not include IMPALA-9181, a perf optimization for query startup done to facilitate this work. I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the query startup times from the profiles. For each concurrency level, the average regression in query startup time was < 2ms. Because query e2e running time was much longer than this, there was no noticable change in total query time. I also ran a 'worst case scenario' with a table with 10,000 pertitions to create a very large Exec() payload to serialize (~1.21MB vs. ~10KB-30KB for TPCH 100). Again, change in query startup time was neglible. TODO: once IMPALA-9335 (krpc rebase) goes in, the change in be/src/kudu/rpc/connection.cc can be removed from this patch. Testing: - Added a e2e test that verifies that a query where an Exec() fails doesn't wait for all Exec()s to complete before cancelling and returning the error to the client. Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 --- M be/src/common/global-flags.cc M be/src/kudu/rpc/connection.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M tests/custom_cluster/test_rpc_exception.py M tests/failure/test_failpoints.py 11 files changed, 382 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/15154/1 -- To view, visit http://gerrit.cloudera.org:8080/15154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 Gerrit-Change-Number: 15154 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5493/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Mon, 03 Feb 2020 21:40:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5492/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Feb 2020 20:00:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Feb 2020 20:00:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Feb 2020 20:00:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5491/ -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Mon, 03 Feb 2020 19:42:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8587: Show inherited privileges with Ranger show grant
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15111 ) Change subject: IMPALA-8587: Show inherited privileges with Ranger show grant .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5596/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e679dc6fcf8d0b0e4e0fc2e9b335e2d8bc0899 Gerrit-Change-Number: 15111 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 03 Feb 2020 19:27:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15149 ) Change subject: IMPALA-9337 [DOCS] Document new way to create external Kudu table in Impala .. Patch Set 2: (4 comments) Thanks for documenting this. http://gerrit.cloudera.org:8080/#/c/15149/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15149/2//COMMIT_MSG@10 PS2, Line 10: Internal I think the usage of word "Internal" is confused. Users are familiar with "managed" and "external" table. http://gerrit.cloudera.org:8080/#/c/15149/2//COMMIT_MSG@11 PS2, Line 11: TBLPROPERTIES See my comment later, the tblproperties syntax only applies when creating the external table with explicit column spec. http://gerrit.cloudera.org:8080/#/c/15149/2/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: http://gerrit.cloudera.org:8080/#/c/15149/2/docs/topics/impala_create_table.xml@241 PS2, Line 241: TBLPROPERTIES [('kudu.table_name'='internal_kudu_name')] | [('external.table.purge'='true')] [,('key1'='value1', 'key2'='value2', ...)] I think this may be confusing to some users since 'external.table.purge'='true' property must be used only when we provide the column spec like in case of managed table. I think may be we should modify the kudu tables SQL syntax as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name (col_name data_type [kudu_column_attribute ...] [COMMENT 'col_comment'] [, ...] [PRIMARY KEY (col_name[, ...])] ) [PARTITION BY kudu_partition_clause] [COMMENT 'table_comment'] STORED AS KUDU [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] | [('external.table.purge'='true', 'key1'='value1',...)] Also, in the section Kudu considerations, can we add a section which talks about such external tables? Something like: >From version 3.4 and above, when Impala is integrated with Hive metastore 3, >managed Kudu tables are translated by default to external Kudu tables by HMS >with 'external.table.purge' property set to true. Such synchronized tables >behave similar to managed tables. A drop table command on such a table will >remove the underlying Kudu table. Similarly, a alter table rename ... command >will rename the underlying Kudu table. Users can also explicitly create such >external Kudu tables similar to managed Kudu tables. An example of creating >such tables is given below. The table property 'external.table.purge' must be >set to true. CREATE EXTERNAL TABLE myextkudutbl ( id int PRIMARY KEY, name string) PARTITION BY HASH PARTITIONS 8 STORED AS KUDU TBLPROPERTIES ('external.table.purge'='true') http://gerrit.cloudera.org:8080/#/c/15149/2/docs/topics/impala_create_table.xml@248 PS2, Line 248: you do not need to create a pre-existing schema in Kudu before : creating an external Kudu table in Impala Creating a external table on a pre-exiting schema in Kudu is still a valid use-case. May be change the wording such that we say that an alternative way to create external table is ... and tell some of the differences between the two. -- To view, visit http://gerrit.cloudera.org:8080/15149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic07380fd53898dd21fbb5dacb4d9f7a84f160d4e Gerrit-Change-Number: 15149 Gerrit-PatchSet: 2 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 18:57:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8587: Show inherited privileges with Ranger show grant
Fang-Yu Rao has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/15111 ) Change subject: IMPALA-8587: Show inherited privileges with Ranger show grant .. IMPALA-8587: Show inherited privileges with Ranger show grant Previously when executing a SHOW GRANT statement on a resource with Ranger authorization enabled, Impala would not show inherited privileges. For example, consider a user 'foo' with database-level privileges granted by: GRANT SELECT ON DATABASE db TO USER foo; If later on we would like to retrieve the table-level privileges associated with the user 'foo' by: SHOW GRANT USER foo ON TABLE db.table; We would not see any result before this change. After this change, the related privileges including the inherited privileges with regard to the specified resource will be shown. In our example described above, we will see the following result and therefore the result returned by SHOW GRANT statement is more informative than the case in which only the privileges on 'db'.'table' were shown. Notice that in the following returned result, we are also able to know the specified user's privileges on any other table under the database 'db'. +++--+---++-+-+---+--+---+ | principal_type | principal_name | database | table | column | uri | udf | privilege | grant_option | create_time | +++--+---++-+-+---+--+---+ | USER | foo| db | * | * | | | select| false| 1580174954746 | +++--+---++-+-+---+--+---+ Testing - Ran all FE tests - Ran all authorization E2E tests - Added E2E tests in test_ranger verifying functionality Change-Id: Ia4e679dc6fcf8d0b0e4e0fc2e9b335e2d8bc0899 --- M fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpaladAuthorizationManager.java M tests/authorization/test_ranger.py 2 files changed, 235 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/15111/5 -- To view, visit http://gerrit.cloudera.org:8080/15111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4e679dc6fcf8d0b0e4e0fc2e9b335e2d8bc0899 Gerrit-Change-Number: 15111 Gerrit-PatchSet: 5 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Austin Nobis Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-9336: [DOCS] Primary and foreign key constraint syntax
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15146 ) Change subject: IMPALA-9336: [DOCS] Primary and foreign key constraint syntax .. IMPALA-9336: [DOCS] Primary and foreign key constraint syntax CREATE TABLE syntax for primary key and foreign keys spec Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Reviewed-on: http://gerrit.cloudera.org:8080/15146 Reviewed-by: Thomas Tauber-Marshall Tested-by: Thomas Tauber-Marshall --- M docs/topics/impala_create_table.xml 1 file changed, 33 insertions(+), 6 deletions(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 6 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9336: [DOCS] Primary and foreign key constraint syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15146 ) Change subject: IMPALA-9336: [DOCS] Primary and foreign key constraint syntax .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 03 Feb 2020 18:34:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9336: [DOCS] Primary and foreign key constraint syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15146 ) Change subject: IMPALA-9336: [DOCS] Primary and foreign key constraint syntax .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 03 Feb 2020 18:31:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py File tests/custom_cluster/test_concurrent_kudu_create.py: http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@65 PS1, Line 65: pool I looked at other usages of client and it seems that we don't need to explicitly call a close on it. Is that correct? -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 18:27:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9336: [DOCS] Primary and foreign key constraint syntax
Hello Anurag Mantripragada, Thomas Tauber-Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15146 to look at the new patch set (#5). Change subject: IMPALA-9336: [DOCS] Primary and foreign key constraint syntax .. IMPALA-9336: [DOCS] Primary and foreign key constraint syntax CREATE TABLE syntax for primary key and foreign keys spec Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 --- M docs/topics/impala_create_table.xml 1 file changed, 33 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/15146/5 -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 18:17:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9346: Fix TestImpalaShell.test config file failing issue on CentOS6/Python 2.6
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15139 ) Change subject: IMPALA-9346: Fix TestImpalaShell.test_config_file failing issue on CentOS6/Python 2.6 .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/15139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief5e825aa3baead5519132d47efcf0d5300860fd Gerrit-Change-Number: 15139 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Feb 2020 18:14:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9336: [DOCS] constraints
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15146 ) Change subject: IMPALA-9336: [DOCS] constraints .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/15146/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15146/4//COMMIT_MSG@7 PS4, Line 7: constraints Usually we want the first line of the commit to be more descriptive, so maybe something like "primary and foreign key constraint syntax" or similar -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 03 Feb 2020 18:12:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9336: [DOCS] constraints
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/15146 ) Change subject: IMPALA-9336: [DOCS] constraints .. Patch Set 4: Code-Review+1 Looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/15146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee12da322fbdab7c671c17ceb8436bc3ace2b820 Gerrit-Change-Number: 15146 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 03 Feb 2020 18:04:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/15151 ) Change subject: IMPALA-9289: Fix flakiness in TestConcurrentKuduCreate .. Patch Set 1: Code-Review+1 (2 comments) Thanks to Quanlong for providing a fix promptly! I only left 2 minor comments. http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py File tests/custom_cluster/test_concurrent_kudu_create.py: http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@a47 PS1, Line 47: I am not very familiar with the usage of tls.client. Is it true that we do not have to explicitly call close() before we are going to terminate the thread pool after the for-loop at https://gerrit.cloudera.org/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py#65? Thanks! http://gerrit.cloudera.org:8080/#/c/15151/1/tests/custom_cluster/test_concurrent_kudu_create.py@51 PS1, Line 51: pool = ThreadPool(processes=3) Just like to check whether or not my understanding is correct. The reason to move ThreadPool() and pool.terminate() out of the for-loop is to reduce the overhead of thread pool creation? -- To view, visit http://gerrit.cloudera.org:8080/15151 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idefba98ecd65efbd47b1618291330795ef13b910 Gerrit-Change-Number: 15151 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 17:59:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4224: execute separate join builds fragments
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14859 ) Change subject: IMPALA-4224: execute separate join builds fragments .. Patch Set 35: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5595/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Gerrit-Change-Number: 14859 Gerrit-PatchSet: 35 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 17:51:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9174: Emit WARNING when ORC lib leaks memory
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15153 ) Change subject: IMPALA-9174: Emit WARNING when ORC lib leaks memory .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5594/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I370fb9f68734e0e555bd7224ab0f5440c4947c66 Gerrit-Change-Number: 15153 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 03 Feb 2020 17:24:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@7 PS2, Line 7: between date and timestamp types Looking at IMPALA-6373 it seems like Impala usually supports these conversions when there is no loss of information. Timestamp and Date don't have the same range, but anyway, supporting Date -> Timestamp seems more reasonable. What is the behavior of other file formats, e.g. Parquet? I don't think we want different behavior. http://gerrit.cloudera.org:8080/#/c/15152/2//COMMIT_MSG@12 PS2, Line 12: to the same ORC file nit: "to the same set of ORC files" probably expresses the intent better http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@121 PS2, Line 121: Status HandleInvalidValue(Tuple* tuple, TErrorCode::type error_code) nit: Please add comment about what is an invalid value and how it is handled. http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214 PS2, Line 214: ORC support schema evolution What does it mean "ORC support schema evolution"? http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@214 PS2, Line 214: Date and Timestamp tables Does Hive convert in both directions? http://gerrit.cloudera.org:8080/#/c/15152/2/be/src/exec/orc-column-readers.h@253 PS2, Line 253: (source_type_ == orc::TypeKind::TIMESTAMP && : static_cast(batch_) == : dynamic_cast(orc_batch)) || : (source_type_ == orc::TypeKind::DATE && : static_cast(batch_) == : dynamic_cast(orc_batch)) nit: complicated and duplicated in OrcTimestampReader. Can you put it into a function wi name 'IsDateTime' or stg like that. -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 17:18:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4224: execute separate join builds fragments
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14859 to look at the new patch set (#35). Change subject: IMPALA-4224: execute separate join builds fragments .. IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. ** Calculate separate join/build resource consumption. ** Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. accordingly. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 --- M be/src/exec/CMakeLists.txt M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.h A be/src/exec/join-builder.cc A be/src/exec/join-builder.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/nested-loop-join-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/initial-reservations.cc M be/src/runtime/row-batch.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/spillable-row-batch-queue.h M be/src/util/summary-util.cc M bin/run-all-tests.sh M common/thrift/DataSinks.thrift M
[Impala-ASF-CR] IMPALA-4224: execute separate join builds fragments
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14859 ) Change subject: IMPALA-4224: execute separate join builds fragments .. Patch Set 34: (11 comments) http://gerrit.cloudera.org:8080/#/c/14859/28/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: http://gerrit.cloudera.org:8080/#/c/14859/28/be/src/exec/blocking-join-node.cc@231 PS28, Line 231: _sink) { > is it still true? The phrase was a bit weird, this was mean to be part of the "if" clause. I rewrote this comment to be clearer and explain the bigger picture. http://gerrit.cloudera.org:8080/#/c/14859/28/be/src/exec/blocking-join-node.cc@246 PS28, Line 246: seSeparateBuild(state->query_options())) { > Maybe you could add a sentence about how the build was already started. Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/nested-loop-join-builder.h File be/src/exec/nested-loop-join-builder.h: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/nested-loop-join-builder.h@61 PS32, Line 61: NljBuilder( > nit: maybe it could be another factory method called CreateStandaloneBuilde Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/nested-loop-join-builder.h@90 PS32, Line 90: util > nit: until Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/nested-loop-join-builder.cc File be/src/exec/nested-loop-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/nested-loop-join-builder.cc@112 PS32, Line 112: void NljBuilder::Reset() { > Based on the comment on the declaration it is not valid to be called on sep Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/partitioned-hash-join-builder.h@103 PS32, Line 103: TDataSink* tsink > Since now it is an output parameter should it be moved at the end of the pa Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/partitioned-hash-join-builder.h@322 PS32, Line 322: 'a > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/exec/partitioned-hash-join-builder.cc@370 PS32, Line 370: void PhjBuilder::Reset(RowBatch* row_batch) { > The declaration comment says it's not valid to be called on a separate buil Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/runtime/bufferpool/buffer-pool.cc@28 PS32, Line 28: #include "util/debug-util.h" > nit: not in alphabetic order, should be one line below Done http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14859/32/be/src/runtime/coordinator.cc@322 PS32, Line 322: fragment.output_sink.type == TDataSinkType::DATA_STREAM_SINK :|| fragment.output_sink.type == TDataSinkType::HASH_JOIN_BUILDER :|| fragment.output_sink.type == TDataSinkType::NESTED_LOOP_JOIN_BUILDER > nit: I wonder if these conditions could be simplified if some parts of them Factored out an IsJoinBuildSink() function. I thought about something to capture this full condition like IsFragmentConnectingSink() but it didn't seem like it made things clearer. http://gerrit.cloudera.org:8080/#/c/14859/32/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/14859/32/common/thrift/PlanNodes.thrift@617 PS32, Line 617: 28 > nit: Since these thrift structures only used between Impala daemons with th I did this mainly to reduce the size of the diff. I think I'd prefer to keep it this way but I can change if you feel strongly - neither option is that good in my mind. -- To view, visit http://gerrit.cloudera.org:8080/14859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Gerrit-Change-Number: 14859 Gerrit-PatchSet: 34 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 17:05:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9174: Emit WARNING when ORC lib leaks memory
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15153 Change subject: IMPALA-9174: Emit WARNING when ORC lib leaks memory .. IMPALA-9174: Emit WARNING when ORC lib leaks memory Added a check to OrcMemPool to test whether there was leaked memory by the ORC library. Impala frees these memory anyway, but it's useful to know if there is a bug in the ORC lib. Testing: * I tested manually * I couldn't add an automated test because currently we are not aware of such bugs in the lib Change-Id: I370fb9f68734e0e555bd7224ab0f5440c4947c66 --- M be/src/exec/hdfs-orc-scanner.cc 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/15153/1 -- To view, visit http://gerrit.cloudera.org:8080/15153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I370fb9f68734e0e555bd7224ab0f5440c4947c66 Gerrit-Change-Number: 15153 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5593/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Mon, 03 Feb 2020 16:32:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4224: execute separate join builds fragments
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14859 ) Change subject: IMPALA-4224: execute separate join builds fragments .. Patch Set 34: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5592/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14859 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 Gerrit-Change-Number: 14859 Gerrit-PatchSet: 34 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 16:27:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5489/ -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 15:50:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG@20 PS1, Line 20: > nit: wrap at 72 chars Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h@224 PS1, Line 224: tampReader(const orc::Type* node, const SlotDes > Do we need this, can't we just compare check != nullprt? I just followed the convention of the other UpdateInputBatch functions where the static casted batch is compared to the dynamic casted one. Maybe we could simplify the DCHECKs everywhere? http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@211 PS1, Line 211: if (IsNull(DCHECK_NOTNULL(batch_), row_idx)) { > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@226 PS1, Line 226: } : *slot = DateValue(ts.DaysSinceUnixEpoch()); : } > This is probably not speed critical, but it could be done faster by ignorin Done http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@229 PS1, Line 229: alid())) { > This will hit a DCHECK if the timestamp is not valid, see https://github.co Done -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Comment-Date: Mon, 03 Feb 2020 15:47:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types This feature adds support for schema evolution between date and timestamp for the ORC scanner. This means that we can have two tables, one with a date column, another with a timestamp column, and they can both point to the same ORC file. The result will be that for the first table everything will be converted to date, and for the second, everything to timestamp. In order to do that, the OrcTimestampReader and OrcDateColumnReader are modified to be able to handle batches of the two types. Their name now represents the destination Impala type. Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column. Tests: * Added type conversion tests. * Tested manually following the use case steps of the Jira. Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e --- M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 5 files changed, 125 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/2 -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 2 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8778: Support Apache Hudi Read Optimized Table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14711 ) Change subject: IMPALA-8778: Support Apache Hudi Read Optimized Table .. Patch Set 16: (8 comments) http://gerrit.cloudera.org:8080/#/c/14711/16/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/14711/16/be/src/exec/hdfs-scan-node-base.cc@379 PS16, Line 379: HUDI_PARQUET > My logic was: I see, but in the backend you just create "low-level" operators, such as scan nodes that need to process some input splits in file format X. So they don't need to be too smart, the planner will tell them what to do. That said, I don't have a too strong opinion about it. http://gerrit.cloudera.org:8080/#/c/14711/16/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/14711/16/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@197 PS16, Line 197: fileFormat_ > Done If fileformat_ is null, then the equality check will just return false which is the expected behavior. http://gerrit.cloudera.org:8080/#/c/14711/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/14711/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@354 PS16, Line 354: isParquet > any suggestion? I couldn't come out a better name here maybe 'isParquetBased'? http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/data/README@482 PS20, Line 482: nit: if possible, please keep the 90 chars line length limit http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/datasets/functional/functional_schema_template.sql@2762 PS20, Line 2762: : : : : Since you are using a custom CREATE statement you'll need to define the partitions in the CREATE TABLE stmt. http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: http://gerrit.cloudera.org:8080/#/c/14711/20/testdata/datasets/functional/schema_constraints.csv@59 PS20, Line 59: hudiparquet is not part of the test dimensions of the functional workload. Since most of the tests would fail with hudiparquet we can cheat here and create the hudi table in the functional_parquet database, i.e. switch to parquet here. http://gerrit.cloudera.org:8080/#/c/14711/20/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/14711/20/tests/query_test/test_scanners.py@313 PS20, Line 313: un_test_cas TestHudiParquet http://gerrit.cloudera.org:8080/#/c/14711/20/tests/query_test/test_scanners.py@320 PS20, Line 320: > Thank you all for reviewing. I am able to use If in 'schema_constraints.csv' you switch to parquet then you can load the hudi tables with --table_formats=parquet/none/none. It's necessary because hudiparquet is not part of the test dimensions, so we'll just put the table in the functional_parquet database. You already only run this test when file_format == parquet. -- To view, visit http://gerrit.cloudera.org:8080/14711 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf Gerrit-Change-Number: 14711 Gerrit-PatchSet: 16 Gerrit-Owner: Yanjia Gary Li Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yanjia Gary Li Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 15:39:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4224: execute separate join builds fragments
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14859 to look at the new patch set (#34). Change subject: IMPALA-4224: execute separate join builds fragments .. IMPALA-4224: execute separate join builds fragments This enables parallel plans with the join build in a separate fragment and fixes all of the ensuing fallout. After this change, mt_dop plans with joins have separate build fragments. There is still a 1:1 relationship between join nodes and builders, so the builders are only accessed by the join node's thread after it is handed off. This lets us defer the work required to make PhjBuilder and NljBuilder safe to be shared between nodes. Planner changes: * Combined the parallel and distributed planning code paths. * Misc fixes to generate reasonable thrift structures in the query exec requests, i.e. containing the right nodes. * Fixes to resource calculations for the separate build plans. ** Calculate separate join/build resource consumption. ** Simplified the resource estimation by calculating resource consumption for each fragment separately, and assuming that all fragments hit their peak resource consumption at the same time. IMPALA-9255 is the follow-on to make the resource estimation more accurate. Scheduler changes: * Various fixes to handle multiple TPlanExecInfos correctly, which are generated by the planner for the different cohorts. * Add logic to colocate build fragments with parent fragments. Runtime filter changes: * Build sinks now produce runtime filters, which required planner and coordinator fixes to handle. accordingly. DataSink changes: * Close the input plan tree before calling FlushFinal() to release resources. This depends on Send() not holding onto references to input batches, which was true except for NljBuilder. This invariant is documented. Join builder changes: * Add a common base class for PhjBuilder and NljBuilder with functions to handle synchronisation with the join node. * Close plan tree earlier in FragmentInstanceState::Exec() so that peak resource requirements are lower. * The NLJ always copies input batches, so that it can close its input tree. JoinNode changes: * Join node blocks waiting for build-side to be ready, then eventually signals that it's done, allowing the builder to be cleaned up. * NLJ and PHJ nodes handle both the integrated builder and the external builder. There is a 1:1 relationship between the node and the builder, so we don't deal with thread safety yet. * Buffer reservations are transferred between the builder and join node when running with the separate builder. This is not really necessary right now, since it is all single-threaded, but will be important for the shared broadcast. - The builder transfers memory for probe buffers to the join node at the end of each build phase. - At end of each probe phase, reservation needs to be handed back to builder (or released). ExecSummary changes: * The summary logic was modified to handle connecting fragments via join builds. The logic is an extension of what was used for exchanges. Testing: * Enable --unlock_mt_dop for end-to-end tests * Migrate some tests to run as part of end-to-end tests instead of custom cluster. * Add mt_dop dimension to various end-to-end tests to provide coverage of join queries, spill-to-disk and cancellation. * Ran a single node TPC-H and TPC-DS stress test with mt_dop=0 and mt_dop=4. Perf: * Ran TPC-H scale factor 30 locally with mt_dop=0. No significant change. Change-Id: I4403c8e62d9c13854e7830602ee613f8efc80c58 --- M be/src/exec/CMakeLists.txt M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.h A be/src/exec/join-builder.cc A be/src/exec/join-builder.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/nested-loop-join-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/initial-reservations.cc M be/src/runtime/row-batch.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/spillable-row-batch-queue.h M be/src/util/summary-util.cc M bin/run-all-tests.sh M common/thrift/DataSinks.thrift M
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5591/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Mon, 03 Feb 2020 15:34:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5491/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Mon, 03 Feb 2020 14:55:30 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: Asynchronous code generation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15105 ) Change subject: WIP: Asynchronous code generation .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5590/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b Gerrit-Change-Number: 15105 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 03 Feb 2020 14:52:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. Patch Set 2: (2 comments) > > > Patch Set 2: > > > The verify job failed because kudu-3ba5ec5d0 (kudu-1.12.0-SNAPSHOT) > > has a new run-time dependency: libcurl.so.4 which is not > available > > in the ubuntu-16.04-configured jenkins worker label. I'm > discussing > > with laszlog the possibility of adding libcurls.so.4 to the > worker > > labe;. > > > > > > > If we decide to take this new Kudu version as a dependency, then > > the correct way to handle libcurl.so.4 as a new runtime > dependency > > is to add it to the list of packages we install in > > bin/bootstrap_system.sh. > > The worker image referenced above is only minimally preconfigured > > to allow fast startup times; Impala runtime/development time > > dependencies should be managed in the bootstrap scripts. > > > > Additionally, the dependency on libcurl.so.4 should be evaluated > > for all OS platforms we claim to have support for: e.g. a brief > > scan of this article[1] claims that running both libcurl.so.3 and > > libcurl.so.4 on Ubuntu 18.04 is at least non-trivial to set up. > > > > [1]: > > https://dev.to/jake/using-libcurl3-and-libcurl4-on-ubuntu-1804-bionic-184g, > > "Using libcurl3 and libcurl4 on Ubuntu 18.04 (Bionic)" > > In bin/bootstrap_system.sh, I don't see us installing curl for > ubuntu, but I see us installing it for centos. I would try adding > it and see if that helps. (We have curl installed in all the docker > images we use to build kudu for the native toolchain.) > > We can run a ubuntu-18.04-from-scratch job to see if it works. Installing curl on Ubuntu 16.04 installs libcurl-gnutls.so.4 but it doesn't install the required libcurl.so.4. "apt install libcurl3" on the other hand works for all supported Ubuntu releases, so I've added that to bin/bootstrap_system.sh. http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh@719 PS2, Line 719: export IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY="file://${IMPALA_TOOLCHAIN}" > Since this is disabled, I think we can set it to an empty string. If that w Setting url to an empty string results in an error but I can set it to something like "file:///non/existing/repo" What do you think? http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh@722 PS2, Line 722: export IMPALA_KUDU_VERSION="3ba5ec5d0" : export IMPALA_KUDU_JAVA_VERSION="1.12.0-SNAPSHOT" > One use case that we want to support is for someone to be able to override Done -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Mon, 03 Feb 2020 14:49:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support
Attila Jeges has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15134 ) Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support .. IMPALA-9279: Update the Kudu version to include VARCHAR support Before this change the preferred way of getting Kudu was to pull it in from the specified CDH build (even if USE_CDP_HIVE was set to true). Optionally by setting USE_CDH_KUDU to false, one could force Impala to use the native toolchain Kudu. But even then, the Kudu Java artifacts would be downloaded from CDH. Since Kudu VARCHAR support won't be backported to CDH, this behavior blocks the Impala side of the Kudu/Impala VARCHAR integration. With this change: 1. Using the native toolchain Kudu (including the Java artifacts) is the default behavior. From now on USE_CDH_KUDU will be set to false by default. Impala can be forced to fall back on using the CDH Kudu by explicitly setting USE_CDH_KUDU to true. 2. Kudu version is updated to include the VARCHAR support. Testing: Ran exhaustive tests with USE_CDH_KUDU=true and USE_CDH_KUDU=false. Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a --- M bin/bootstrap_system.sh M bin/bootstrap_toolchain.py M bin/impala-config.sh M impala-parent/pom.xml 4 files changed, 43 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/3 -- To view, visit http://gerrit.cloudera.org:8080/15134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Gerrit-Change-Number: 15134 Gerrit-PatchSet: 3 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal
[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5589/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 03 Feb 2020 14:39:38 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: Asynchronous code generation
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15105 Change subject: WIP: Asynchronous code generation .. WIP: Asynchronous code generation This commit introduces optional asynchronous code generation. Asynchronous code generation means that instead of waiting for codegen to finish, the query starts in interpreted mode while codegen is done on another thread. All the function pointers that point to codegen'd functions are changed to be atomic, wrapped in a CodegenFnPtr. These are initialised to nullptr and as long as they are nullptr, the corresponding interpreted functions are used (as before). When code generation is ready, the funtion pointers are set by the codegen thread. No synchronisation is needed as the function pointers are atomic and it is not a problem if, at a given moment, only a subset of the codegen'd function pointers are set and the rest are interpreted. Asynchronous code generation can be turned on using the ASYNC_CODEGEN boolean query option. TODO: The default should be synchronous codegen for now. TODO: Testing. TODO: Benchmarks. Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b --- M be/src/benchmarks/hash-benchmark.cc A be/src/codegen/codegen-fn-ptr.h M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/non-grouping-aggregator.cc M be/src/exec/non-grouping-aggregator.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/select-node.cc M be/src/exec/select-node.h M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/exprs/expr-codegen-test.cc M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/exprs/scalar-expr.inline.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/runtime-state.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 45 files changed, 453 insertions(+), 214 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/15105/2 -- To view, visit http://gerrit.cloudera.org:8080/15105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b Gerrit-Change-Number: 15105 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] WIP: Asynchronous code generation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15105 ) Change subject: WIP: Asynchronous code generation .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/15105/2/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15105/2/be/src/exec/hdfs-avro-scanner.cc@559 PS2, Line 559: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/15105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b Gerrit-Change-Number: 15105 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 03 Feb 2020 14:05:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5588/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 13:56:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9287: Fix test kudu table create without hms in Hive3
wangsheng has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/15057 ) Change subject: IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 .. IMPALA-9287: Fix test_kudu_table_create_without_hms in Hive3 When build impala after setting USE_CDP_HIVE=true, the custom cluster test case test_kudu_table_create_without_hms would failed due to lacking of related jars. The solution is to add related maven dependency in $IMPALA_HOME/fe/pom.xml and $IMPALA_HOME/shaded-deps/pom.xml. Tests: * Ran test_kudu_table_create_without_hms.py by setting USE_CDP_HIVE=true locally Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 --- M bin/bootstrap_system.sh M fe/pom.xml M shaded-deps/pom.xml 3 files changed, 31 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/15057/9 -- To view, visit http://gerrit.cloudera.org:8080/15057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc7d7e30cd560d43bb707dec54f4494355809f66 Gerrit-Change-Number: 15057 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns .. Patch Set 8: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/15108/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15108/8//COMMIT_MSG@7 PS8, Line 7: IMPALA-9330: Support resolving unmasked nested columns nit: maybe add something like "in masked tables"? http://gerrit.cloudera.org:8080/#/c/15108/8/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/15108/8/tests/authorization/test_ranger.py@806 PS8, Line 806: they won't be recognized (same as Hive). I would prefer to give an error, but I am ok with the current status. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 13:50:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 1: (5 comments) The design + code seems good to me, but I am not too enthusiastic about the feature itself. We may need to support it, if there are already tables like that (and people want to read them with Impala), but I would prefer not to do it if we don't know whether it is needed. This kind of schema evolution seems generally a bad idea to me, as both Data->Timestamp and Timestamp->Data are lossy conversion in Impala. I also expect some complex work around this code related to timezones and predicate push down in the future, and supporting these two type mappings will make it harder. http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15152/1//COMMIT_MSG@20 PS1, Line 20: f nit: wrap at 72 chars http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.h@224 PS1, Line 224: static_cast(batch_) Do we need this, can't we just compare check != nullprt? http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@211 PS1, Line 211: nit: extra line http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@226 PS1, Line 226: int64_t nanos = current_batch->nanoseconds.data()[row_idx]; : TimestampValue ts = TimestampValue::FromUnixTimeNanos(secs, nanos, : scanner_->state_->local_time_zone()); This is probably not speed critical, but it could be done faster by ignoring nanoseconds and using FromUnixTime() directly. http://gerrit.cloudera.org:8080/#/c/15152/1/be/src/exec/orc-column-readers.cc@229 PS1, Line 229: DaysSinceUnixEpoch This will hit a DCHECK if the timestamp is not valid, see https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-value.inline.h#L94 , so it can be only called after checking the timestamps validity. -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 03 Feb 2020 13:41:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5490/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 23: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 23 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:30:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 13:29:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Support resolving unmasked nested columns .. Patch Set 8: (5 comments) Thanks for your review! http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG@8 PS7, Line 8: > Please add that the patch deals with nested tables which have column masks Done http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG@14 PS7, Line 14: select t.id, a.item from tableA t, t.int_array a; > These seem like hacks to me (especailly 2.) that are needed because InlineV Done http://gerrit.cloudera.org:8080/#/c/15108/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/15108/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@912 PS7, Line 912: Resolves > Nit: Did you mean Resolves? Oops, yes... http://gerrit.cloudera.org:8080/#/c/15108/7/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test File testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test: http://gerrit.cloudera.org:8080/#/c/15108/7/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test@403 PS7, Line 403: > Note that Impala EE tests do not need ORDER BY to make the results determin Done http://gerrit.cloudera.org:8080/#/c/15108/7/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/15108/7/tests/authorization/test_ranger.py@805 PS7, Line 805: policy_cnt += 1 > Can you also add a mask for a nested column? As we discussed in chat, it is Done -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 13:10:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9330: Support resolving unmasked nested columns
Hello Anurag Mantripragada, Fang-Yu Rao, Vihang Karajgaonkar, Kurt Deschler, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15108 to look at the new patch set (#8). Change subject: IMPALA-9330: Support resolving unmasked nested columns .. IMPALA-9330: Support resolving unmasked nested columns Column masking policies on primitive columns of a table which contains nested types (though they won't be masked) will cause query failures. To be specifit, if tableA(id int, int_array array) has a masking policy on column "id", all queries on "tableA" will fail, e.g. select id from tableA; select t.id, a.item from tableA t, t.int_array a; Column masking is implemented by wrapping the underlying table/view with a table masking view. However, as we don't support nested types in SelectList, the table masking view can't expose nested columns of the masked table, which causes collection refs not being resolved correctly. This patch fixes the issue by 2 steps: 1) Expose nested columns of the underlying table in the output Type of the table masking view (see InlineViewRef#createTupleDescriptor()). So nested Paths in the original query block can be resolved. 2) For such kind of Paths, resolved them again inside the table masking view. So they can point to the underlying table as what they mean (see Analyzer#resolvePathWithMasking()). TupleDescriptor of such kind of table masking view won't be materialized since the view is simple enough that its query plan is just a ScanNode of the underlying table. The whole query plan can be stitched as if the table is not masked. Note that one day when we support nested columns in SelectList, we may don't need these 2 hacks. This patch also adds some TRACE level loggings to improve debuggability. Test changes in TestRanger.test_column_masking: - Add column masking policy on a table containing nested types. - Add queries on the masked tables. Some queries are borrowed from existing tests for nested types. Tests: - Run test_ranger.py locally. Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/authorization/TableMask.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/authorization/test_ranger.py 12 files changed, 527 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/15108/8 -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9330: Resolve nested types of masked tables
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15108 ) Change subject: IMPALA-9330: Resolve nested types of masked tables .. Patch Set 7: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG@8 PS7, Line 8: Please add that the patch deals with nested tables which have column masks on top-level non-nested columns, but not with column masks on nested columns. http://gerrit.cloudera.org:8080/#/c/15108/7//COMMIT_MSG@14 PS7, Line 14: This patch fixes the issue by 2 steps: These seem like hacks to me (especailly 2.) that are needed because InlineViewRef cannot represent complex columns at the moment. It could be mentioned that these won't be needed if Impala will be able to represent complex types in the select list. http://gerrit.cloudera.org:8080/#/c/15108/7/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test File testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test: http://gerrit.cloudera.org:8080/#/c/15108/7/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test@403 PS7, Line 403: order by id Note that Impala EE tests do not need ORDER BY to make the results deterministic - the EE test framework checks if the there is an ORDER BY in the query, and if not, then it sorts both expected and actual rerults before comparing. http://gerrit.cloudera.org:8080/#/c/15108/7/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/15108/7/tests/authorization/test_ranger.py@805 PS7, Line 805: self.execute_query_expect_success(admin_client, "refresh authorization", Can you also add a mask for a nested column? As we discussed in chat, it is allowed by Ranger, but not supported by Hive and Impala. It would be nice to check if we get a proper error message when querying a column like that. -- To view, visit http://gerrit.cloudera.org:8080/15108 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1cc5565c64c1a4a56445b8edde59b1168f387791 Gerrit-Change-Number: 15108 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Mon, 03 Feb 2020 12:16:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5489/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 11:33:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 9 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 11:33:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5587/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 11:01:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5586/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 10:53:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15152 ) Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/5585/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 03 Feb 2020 10:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8755: Backend support for Z-ordering
Norbert Luksa has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/14080 ) Change subject: IMPALA-8755: Backend support for Z-ordering .. IMPALA-8755: Backend support for Z-ordering This change depends on gerrit.cloudera.org/#/c/13955/ (Frontend support for Z-ordering) The commit adds a Comparator based on Z-ordering. See in detail: https://en.wikipedia.org/wiki/Z-order_curve The comparator instead of calculating the Z-values of the rows, looks for the column with the most significant dimension, and compares the values of this column only. The most significant dimension will be the one where the compared values have the highest different bits. The algorithm requires values of the same binary representation, therefore the values are converted into either uint32_t, uint63_t or uint128_t, the smallest in which all data fits. Comparing smaller types with bigger ones would make the bigger type much more dominant therefore the bits of these smaller types are shifted up. All primitive types (including string and floating point types) are supported. Testing: * Added unit tests. * Run manual tests, comparing 4-column values with 4-bit integers, for all possible combinations. Checked the result by calculating the Z-value for each comparison. * Tested performance on various data, getting great results for selective queries. An example: used the TPCH dataset's lineitem table with scale 25, where the sorting columns are l_partkey and l_suppkey, in that order. Run selective queries for the value range of the two columns, for both lexical and Z-ordering and compared the percentage of filtered pages and row groups. While queries with filters on the first column showed almost no difference, queries on the second column is in favour of Z-ordering: Ordering | Column | Filtered pages % | Filtered row groups % Lex. 1st ~99% ~90% Z-ord. 1st ~99% ~89% Lex. 2nd ~25% 0% Z-ord. 2nd ~97% 0% The only drawback is the sorting itself, taking ~4 times more than lexical sorting (eg. sorting for the dataset above took 14m for Lexical, and 55m for Z-ordering). Note however, that this is a one-time thing to do, sorting only happens once, when writing the data. Also, lexical ordering is supported by codegen, while it is not implemented for Z-ordering yet. Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab --- M be/src/exec/exchange-node.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partial-sort-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/CMakeLists.txt A be/src/util/tuple-row-compare-test.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 18 files changed, 1,128 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/14080/22 -- To view, visit http://gerrit.cloudera.org:8080/14080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0200748ce3e65ebc5d3530f794c0f80aa335a2ab Gerrit-Change-Number: 14080 Gerrit-PatchSet: 22 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 10:13:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup = l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s3.80s 32% l_commitdate DICTIONARY 5.46s5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%, while there were no regression for the others. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 135 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/8 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9226: Improve string allocations of the ORC scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/15051 ) Change subject: IMPALA-9226: Improve string allocations of the ORC scanner .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/15051/7/be/src/exec/orc-column-readers.h@212 PS7, Line 212: static_cast(batch_) == : dynamic_cast(orc_batch) > it will be true even if orc_batch is just a StringVectorBatch, because it w Done -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 8 Gerrit-Owner: Norbert Luksa Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 03 Feb 2020 10:07:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types
Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15152 Change subject: IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types .. IMPALA-9290: ORC scanner should support schema evolution between date and timestamp types This feature adds support for schema evolution between date and timestamp for the ORC scanner. This means that we can have two tables, one with a date column, another with a timestamp column, and they can both point to the same ORC file. The result will be that for the first table everything will be converted to date, and for the second, everything to timestamp. In order to do that, the OrcTimestampReader and OrcDateColumnReader are modified to be able to handle batches of the two types. Their name now represents the destination Impala type. Note that the life cycle of a OrcColumnReader is within the life cycle of the HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't have two types for one column. Tests: * Added type conversion tests. * Tested manually following the use case steps of the Jira. Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e --- M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M tests/query_test/test_scanners.py 5 files changed, 115 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/15152/1 -- To view, visit http://gerrit.cloudera.org:8080/15152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7979ecc61b2ab900090d01bc81e7bb7b28c99c9e Gerrit-Change-Number: 15152 Gerrit-PatchSet: 1 Gerrit-Owner: Norbert Luksa