[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1451/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 07:13:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11793 ) Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1450/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb Gerrit-Change-Number: 11793 Gerrit-PatchSet: 3 Gerrit-Owner: Greg Rahn Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 06:56:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3502/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 06:27:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 5 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 06:27:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 4: Code-Review+2 Trivial test fix for for non-TTY stdin. -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 06:26:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Fredy Wijaya has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query This patch fixes the issue with Ctrl+C handling for cancelling a non-running query to behave similar to Linux shell. Before (pressing Ctrl+C does not do anything): [localhost:21000] default> select After (pressing Ctrl+C cancels the query and starts a new prompt): [localhost:21000] default> select^C [localhost:21000] default> Testing: - Added a new cancellation test - Ran all shell E2E tests Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d --- M shell/impala_shell.py M tests/custom_cluster/test_client_ssl.py M tests/shell/test_shell_interactive.py 3 files changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/11990/4 -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11793 ) Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11793/3/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/11793/3/be/src/exprs/string-functions-ir.cc@1127 PS3, Line 1127: // https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C++ line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb Gerrit-Change-Number: 11793 Gerrit-PatchSet: 3 Gerrit-Owner: Greg Rahn Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 06:21:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7759: Add Levenshtein edit distance built-in function
Greg Rahn has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/11793 ) Change subject: IMPALA-7759: Add Levenshtein edit distance built-in function .. IMPALA-7759: Add Levenshtein edit distance built-in function This patch adds new built-in functions to calculate Levenshtein edit distance. Implemented as levenshtein() to match PostgreSQL in both functionality and name and also added le_dst() alias for Netezza, compatibility, but note that levenshtein() differs in functionality in that if either value is NULL or both values are NULL, levenshtein() returns NULL, where Netezza's le_dst() returns the length of the not NULL value or 0 if both values are NULL. Testing: - Added unit tests to expr-test.cc - Manual test on 966289 string pairs and results match PostgreSQL - Added changes to qgen tests for PostgreSQL comparison Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py A tests/comparison/compat.py M tests/comparison/discrepancy_searcher.py M tests/comparison/funcs.py 7 files changed, 128 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11793/3 -- To view, visit http://gerrit.cloudera.org:8080/11793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I549d33ab7cebfa10db2934461c8ec91e2cc1cdcb Gerrit-Change-Number: 11793 Gerrit-PatchSet: 3 Gerrit-Owner: Greg Rahn Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11997 ) Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. IMPALA-7870: Increase the timeout in test_v1_catalog TestAutomaticCatalogInvalidation.test_v1_catalog need to wait for a predefined time for the invalidation to take effect. The test is flaky recently because of it. This patch increates the timeout by 2.5x. Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Reviewed-on: http://gerrit.cloudera.org:8080/11997 Reviewed-by: Todd Lipcon Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_automatic_invalidation.py 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11997 ) Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 28 Nov 2018 04:25:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 28 Nov 2018 02:48:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false Since IMPALA-6812, we've run many of our tests against Kudu at the READ_AT_SNAPSHOT scan level, which ensures consistent results. This scan level is only supported if Kudu is run with the flag --use_hybrid_clock=true (which is the default). This patch uses the Kudu master webui to detect when use_hybrid_clock is false and skips these tests. Follow up work will address allowing these tests to run regardless of the value of the flag. Testing: - Ran a full exhaustive build with use_hybrid_clock=false set in the minicluster. Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Reviewed-on: http://gerrit.cloudera.org:8080/11851 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M bin/impala-config.sh M tests/common/kudu_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_kudu.py M tests/metadata/test_ddl.py M tests/query_test/test_kudu.py 6 files changed, 52 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@392 PS6, Line 392: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol())); > is this test working as a proper regression test now that you switched to t Can't quite explain yet what the test is doing. Did trace though the code and verified the calls are working as you intended. So, this code does exercise the incremental refresh path, which is good. The mystery is that, before I fixed that flipped if/then statement, the test also passed. So, I need to figure out why the metrics come out correct either way. But, didn't want to hold up the customer patch because of that bit of curiosity. -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 28 Nov 2018 02:38:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12000/1/be/src/runtime/coordinator-backend-state.cc@502 PS1, Line 502: ToStringFromUnixMillis(last_report_time_ms_) An alternate approach would be to just store 'last_report_time_ms_' as string and convert it using ToStringFromUnixMillis() when exporting the profile as Thrift object or when we are pretty printing it. That said, it's unclear whether the complication is worth it given this is not in the critical path of returning query results. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 28 Nov 2018 01:24:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11992 ) Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. Patch Set 7: I also noted that a lot of Open() and GetNext() implementations have similar preambles, e.g. SCOPED_TIMER(runtime_profile_->total_time_counter()); ScopedGetNextEventAdder ea(this, eos); DCHECK(row_batch != NULL); RETURN_IF_ERROR(ExecDebugAction(TExecNodePhase::GETNEXT, state)); RETURN_IF_CANCELLED(state); RETURN_IF_ERROR(QueryMaintenance(state)); Some leave parts of those out, though it's not clear whether that happens on purpose. Should we file a follow-up Jira to unify the preambles, possibly with some generic macro? -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 28 Nov 2018 01:41:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 28 Nov 2018 01:46:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11992 ) Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h File be/src/exec/exec-node-util.h: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node-util.h@57 PS7, Line 57: Fetched nit: maybe reword to "First Batch Requested" to indicate that it is still in progress? http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/exec-node.h@260 PS7, Line 260: were nit: was http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc File be/src/exec/hbase-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/hbase-scan-node.cc@153 PS7, Line 153: // For GetNext, most of the time is spent in HBaseTableScanner::ResultScanner_next, : // but there's still some considerable time inside here. I don't think this comment helps much and would be good with dropping it altogether. http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/kudu-scan-node.cc@96 PS7, Line 96: SCOPED_TIMER(materialize_tuple_timer()); Can you add a brief comment why this timer gets initialized after the preamble? http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc File be/src/exec/singular-row-src-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/singular-row-src-node.cc@33 PS7, Line 33: ScopedOpenEventAdder ea(this); >From the FE comment in SingularRowSrcNode is seems that this will always run >in a subplan: A SingularRowSrcNode returns the current row that is being processed by its containing SubplanNode. A SingularRowSrcNode can only appear in the plan tree of a SubplanNode. A SingularRowSrcNode returns its parent's smap such that substitutions are appropriately applied within the SubplanNode's second child. In that case, do we need to instrument this method at all? http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc File be/src/exec/unnest-node.cc: http://gerrit.cloudera.org:8080/#/c/11992/7/be/src/exec/unnest-node.cc@142 PS7, Line 142: // Avoid expensive query maintenance overhead for small collections. Did you omit the instrumentation here because it is expensive? -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 28 Nov 2018 01:39:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11987 ) Change subject: IMPALA-7823: Clean up unused Java imports .. IMPALA-7823: Clean up unused Java imports Cleans up unused Java imports. Testing: This is only a source-level change. Verified that the FE builds correctly. Ran FE tests. Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Reviewed-on: http://gerrit.cloudera.org:8080/11987 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M fe/src/main/java/org/apache/impala/analysis/PlanHint.java M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java M fe/src/main/java/org/apache/impala/authorization/SentryAuthProvider.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ValueRange.java M fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/NativeLogger.java M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java M fe/src/test/java/org/apache/impala/planner/S3PlannerTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java M fe/src/test/java/org/apache/impala/util/JniUtilTest.java M fe/src/test/java/org/apache/impala/util/TestTopNCache.java 39 files changed, 6 insertions(+), 87 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Gerrit-Change-Number: 11987 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11987 ) Change subject: IMPALA-7823: Clean up unused Java imports .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Gerrit-Change-Number: 11987 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 28 Nov 2018 01:17:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12000 ) Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12000/1/tests/query_test/test_observability.py@112 PS1, Line 112: r flake8: F841 local variable 'results' is assigned to but never used -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 28 Nov 2018 01:07:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6741: Add timestamp of fragment instance's status updates
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12000 Change subject: IMPALA-6741: Add timestamp of fragment instance's status updates .. IMPALA-6741: Add timestamp of fragment instance's status updates Currently, the profile of a running query doesn't contain any timestamps for the last updates from the fragment instances. This makes it hard to differentiate between when a fragment instance failed to send status reports to the coordinator for various reasons (e.g. IMPALA-2990) or a truly stuck fragment instance. This change adds a timestamp to a fragment instance's profile to record the time when the coordinator last received a status update from it. Note that it's possible that there is delay between when the status was created on the executor and when it arrived at the coordinator. Given that the clocks are not necessarily synchronized across all executors, the receiving time of the update at the coordinator seems easier to make sense of. Sample output: Fragment F01: Instance 494d948d3235441a:23eae1790001 (host=???):(Total: 15.099ms, non-child: 263.951us, % non-child: 1.75%) Last report time: 2018-11-27 16:57:30.014 Hdfs split stats (:<# splits>/): 0:1/1.58 KB Fragment Instance Lifecycle Event Timeline: 15.622ms - Prepare Finished: 1.026ms (1.026ms) - Open Finished: 1.137ms (110.297us) - First Batch Produced: 15.010ms (13.873ms) - First Batch Sent: 15.080ms (70.715us) - ExecInternal Finished: 15.622ms (541.181us) Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M tests/query_test/test_observability.py 5 files changed, 85 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12000/1 -- To view, visit http://gerrit.cloudera.org:8080/12000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iae3dcddc292d694d7003d10ed0caccfceed7d8fa Gerrit-Change-Number: 12000 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1448/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 00:37:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6924: Add child queries to profile in compute stats
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11977 ) Change subject: IMPALA-6924: Add child queries to profile in compute stats .. Patch Set 2: (1 comment) This will be very useful. I had one high level question about the approach. Otherwise the code looked good. http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc File be/src/service/child-query.cc: http://gerrit.cloudera.org:8080/#/c/11977/2/be/src/service/child-query.cc@104 PS2, Line 104: profile_->AddInfoString("Profile", get_profile_resp.profile); Did you think about splicing the profiles together with AddChild(). It feels a bit wrong putting the text version of the profile in the machine-readable profile since it mixes the two formats and prevents tools from working properly. Or is the idea that a profile analysis tool should be able to link up the queries itself if needed based on the query ids? And this is just for human convenience? If we're going down that path maybe it would make sense to add machine-readable metadata to the thrift struct with the child query ids - it would be good to have a clear story about how a tool should use this. -- To view, visit http://gerrit.cloudera.org:8080/11977 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5006c3b366d381eed4687e550cdfc463be3d1350 Gerrit-Change-Number: 11977 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 00:34:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11997 ) Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3501/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 28 Nov 2018 00:34:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1447/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 28 Nov 2018 00:17:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11997 ) Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 28 Nov 2018 00:06:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/11967/6/shell/impala_client.py@211 PS6, Line 211: flake8: E203 whitespace before ',' -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 00:04:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11967 ) Change subject: IMPALA-1048: show sinks in exec summary .. Patch Set 6: Ok I think this is ready for review. -- To view, visit http://gerrit.cloudera.org:8080/11967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 Gerrit-Change-Number: 11967 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 28 Nov 2018 00:03:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1048: show sinks in exec summary
Hello Michael Ho, Lars Volker, Philip Zeyliger, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11967 to look at the new patch set (#6). Change subject: IMPALA-1048: show sinks in exec summary .. IMPALA-1048: show sinks in exec summary The exec summary now includes the total time taken and memory consumed by the data sink at the root of each fragment. Previously the exec summary could hide where time and memory went while executing a query. The high-level changes are: * Generalising logic in the exec summary and runtime profile to handle data sinks, not just plan nodes, including adding richer metadata to runtime profile nodes. * Threading through metadata about the data sinks, like names and estimates, so that it can appear in the exec summary. The major potential downside is that the new timings reported for data stream sender can overlap with the receiver's time and potentially cause confusion. [localhost:21000] default> select count(distinct l_comment) from tpch_parquet.lineitem; summary; Query: select count(distinct l_comment) from tpch_parquet.lineitem Query submitted at: 2018-11-20 16:47:03 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=f5464383a3bb6878:54b5252b +---+ | count(distinct l_comment) | +---+ | 4580667 | +---+ Fetched 1 row(s) in 4.13s +-++--+--+---++---+---+---+ | Operator| #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail| +-++--+--+---++---+---+---+ | F02:ROOT| 1 | 59.11ms | 59.11ms | || 0 B | 0 B | | | 06:AGGREGATE| 1 | 274.24us | 274.24us | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE | | 05:EXCHANGE | 1 | 75.16us | 75.16us | 3 | 1 | 32.00 KB | 16.00 KB | UNPARTITIONED | | F01:EXCHANGE SENDER | 3 | 119.53us | 146.28us | || 16.00 KB | 0 B | | | 02:AGGREGATE| 3 | 19.26ms | 19.89ms | 3 | 1 | 16.00 KB | 10.00 MB | | | 04:AGGREGATE| 3 | 1.06s| 1.07s| 4.58M | 4.65M | 96.02 MB | 62.63 MB | | | 03:EXCHANGE | 3 | 243.91ms | 246.44ms | 5.01M | 4.65M | 464.00 KB | 10.12 MB | HASH(l_comment) | | F00:EXCHANGE SENDER | 3 | 2.41s| 2.55s| || 337.53 KB | 0 B | | | 01:AGGREGATE| 3 | 1.05s| 1.14s| 5.01M | 4.65M | 97.20 MB | 121.17 MB | STREAMING | | 00:SCAN HDFS| 3 | 37.88ms | 41.28ms | 6.00M | 6.00M | 27.88 MB | 80.00 MB | tpch_parquet.lineitem | +-++--+--+---++---+---+---+ Testing: Added a basic observability test. Change-Id: I3fdf7bacae8ff597b255da65af453e174ba53544 --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/summary-util.cc M common/thrift/DataSinks.thrift M common/thrift/ExecStats.thrift M common/thrift/RuntimeProfile.thrift M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M shell/impala_client.py M tests/beeswax/impala_beeswax.py M tests/query_test/test_observability.py 36 files changed, 321 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/11227/6/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@392 PS6, Line 392: OpType.GET_FILE_BLOCK_LOCATIONS.getSymbol())); is this test working as a proper regression test now that you switched to the constants? -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 28 Nov 2018 00:00:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 6: Code-Review+1 Reran the unit tests, stepped through the code and made sure that we followed the refresh path. Changed the unit test to use HDFS-provided constants rather than string literals. As far as I can tell, the fix works as intended. Would prefer to see this tested in a dynamic cluster to gain a bit more confidence. -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 27 Nov 2018 23:42:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Paul Rogers has uploaded a new patch set (#6) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. IMPALA-7047. Refreshing partitions should not make an RPC per file The code to handle REFRESH of a single partition was incorrectly ignoring the previously-known file descriptors. This meant that, instead of only calling 'getFileBlockLocations' on the files that had changed since the prior load, it was instead calling it on every file. In addition to refresh of single partitions this also affected refresh of unpartitioned tables (which is implemented as a refresh of its single "default" partition). This patch fixes the behavior by copying over the existing file descriptor list into the re-created partition before refreshing it. A new unit test uses FS statistics to verify the change. The new assertions act as a regression test and fail if I comment out the fix. Additionally, this fixes the case where the old partition had no files to use the optimized 'listLocatedStatus' call. This is triggered when 'REFRESH' picks up a new partition from the HMS added by an external system. I also tested this by pointing my dev box at a remote filesystem that was approximately 60ms away. The initial load of an unpartitioned table with approximately 45000 files takes around 23 seconds in this setup. Without the patch in place, REFRESH was taking upwards of 35 minutes (I got tired and gave up at this point). Multiplying the 60ms round trip by 45000 files estimates 45 minutes. With the fix in place, REFRESH of the same table took around 4.5 seconds. Clearly, in typical setups where catalogd and HDFS are on a shared local network, the gains won't be so dramatic. But, even with a 1ms round trip (plausible when including fixed RPC overhead and potentially congested datacenter networks) this would save 45 seconds on this example table with 45000 files. Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 2 files changed, 73 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11227/6 -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 2: Looks like test_show_create_table failed with a NPE -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 23:26:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6964: Track stats about column and page sizes in Parquet reader
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11575 ) Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader .. Patch Set 12: (6 comments) I had a comment about the lock upgrading, and some nits, but I think it's getting there. http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@904 PS12, Line 904: bytes_read_per_col_lock_); Please wrap the lock in a scope block to release it after the for() statement ends. http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@962 PS12, Line 962: bytes_read_per_col_guard_read_lock.unlock(); There's a race between unlocking and re-locking here. You can use boost:upgrade_lock to achieve the same effect atomically. http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py File tests/infra/test_utils.py: http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@36 PS12, Line 36: def test_get_bytes_summary_stats_counter(): Can you double-check that this actually gets executed? :) http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@37 PS12, Line 37: nit: remove this whitespace http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@40 PS12, Line 40: runtime_profile = "- ExampleCounter: (Avg: 8.00 KB (8192) ; " \ maybe pick an example that's not all the same values? http://gerrit.cloudera.org:8080/#/c/11575/12/tests/util/parse_util.py File tests/util/parse_util.py: http://gerrit.cloudera.org:8080/#/c/11575/12/tests/util/parse_util.py@156 PS12, Line 156: re.X) I think re.VERBOSE is easier to understand -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 12 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 27 Nov 2018 23:18:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11997 ) Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1446/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Nov 2018 23:12:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3500/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 27 Nov 2018 22:52:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 27 Nov 2018 22:52:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7870: Increase the timeout in test v1 catalog
Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11997 Change subject: IMPALA-7870: Increase the timeout in test_v1_catalog .. IMPALA-7870: Increase the timeout in test_v1_catalog TestAutomaticCatalogInvalidation.test_v1_catalog need to wait for a predefined time for the invalidation to take effect. The test is flaky recently because of it. This patch increates the timeout by 2.5x. Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b --- M tests/custom_cluster/test_automatic_invalidation.py 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/11997/1 -- To view, visit http://gerrit.cloudera.org:8080/11997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If7d37a6109b2e8de1473d42d699b8c7057d0b29b Gerrit-Change-Number: 11997 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3497/ -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 22:36:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3496/ -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 22:19:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11987 ) Change subject: IMPALA-7823: Clean up unused Java imports .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Gerrit-Change-Number: 11987 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 21:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11987 ) Change subject: IMPALA-7823: Clean up unused Java imports .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3499/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Gerrit-Change-Number: 11987 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 21:11:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7823: Clean up unused Java imports
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11987 ) Change subject: IMPALA-7823: Clean up unused Java imports .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1820a7f9e55dc168510987520949026169f445e Gerrit-Change-Number: 11987 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 20:59:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11992 to look at the new patch set (#7). Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. IMPALA-2343: Add lifecycle timeline to plan nodes Track the time when various significant events in the lifecycle of an ExecNode occur the timeline of fragment execution. The events tracked are: 'Open Started', 'Open Finished', 'First Batch Fetched', 'First Batch Returned', 'Last Batch Returned', 'Closed'. This uses the existing EventSequence infrastructure so that time will correspond to the fragment instance lifecycle timelines. It's implemented mostly using scoped objects that add the events when entering or exiting Open() and GetNext(). These times are not set inside subplans because it would be mostly redundant with the counters in the containing subplan node. Testing: Added a basic test to verify that the event sequence is present. Perf: Ran TPC-H 10 locally. There was no significant perf change. Ran TPC-H Nested locally. There was no significant perf change. Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 --- M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/cardinality-check-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/empty-set-node.cc M be/src/exec/empty-set-node.h M be/src/exec/exchange-node.cc A be/src/exec/exec-node-util.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/select-node.cc M be/src/exec/singular-row-src-node.cc M be/src/exec/singular-row-src-node.h M be/src/exec/sort-node.cc M be/src/exec/streaming-aggregation-node.cc M be/src/exec/subplan-node.cc M be/src/exec/topn-node.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M tests/query_test/test_observability.py 26 files changed, 229 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/11992/7 -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-2343: Add lifecycle timeline to plan nodes
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11992 to look at the new patch set (#6). Change subject: IMPALA-2343: Add lifecycle timeline to plan nodes .. IMPALA-2343: Add lifecycle timeline to plan nodes Track the time when Open() and GetNext() are first and last called, to make it easier to piece together the timeline for fragment execution. This uses the existing EventSequence infrastructure so that time will correspond to the fragment instance lifecycle timelines. These times are not set inside subplans because it would be mostly redundant with the counters in the containing subplan node. Testing: Added a basic test to verify that the event sequence is present. Perf: Ran TPC-H 10 locally. There was no significant perf change. Ran TPC-H Nested locally. There was no significant perf change. Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 --- M be/src/exec/aggregation-node.cc M be/src/exec/analytic-eval-node.cc M be/src/exec/cardinality-check-node.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/empty-set-node.cc M be/src/exec/empty-set-node.h M be/src/exec/exchange-node.cc A be/src/exec/exec-node-util.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hbase-scan-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/select-node.cc M be/src/exec/singular-row-src-node.cc M be/src/exec/singular-row-src-node.h M be/src/exec/sort-node.cc M be/src/exec/streaming-aggregation-node.cc M be/src/exec/subplan-node.cc M be/src/exec/topn-node.cc M be/src/exec/union-node.cc M be/src/exec/unnest-node.cc M tests/query_test/test_observability.py 26 files changed, 229 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/11992/6 -- To view, visit http://gerrit.cloudera.org:8080/11992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15341bdb15022bad9814882689ce5cb2939f4653 Gerrit-Change-Number: 11992 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1445/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 27 Nov 2018 19:47:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7047. Refreshing partitions should not make an RPC per file
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11227 ) Change subject: IMPALA-7047. Refreshing partitions should not make an RPC per file .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG@12 PS4, Line 12: since the prior load, it was instead calling it on every file. > General question. In HDFS, we can tell if a file was added or removed. Once In terms of files "changing", we do have to worry about a file being replaced with the same name as a pre-existing file. I think that's why we compare mtimes. In terms of balancing blocks, we detect on the backend if we did a remote read of a block, and if so, we issue a SQL warning which tells the user to issue an INVALIDATE METADATA for that table. Personally I think this sucks -- what we should probably do is have the backend notify the catalogd that the block locations for that file need to be refreshed, and do so automatically. That said, since we're eventually hoping to move more towards a fine-grained caching with TTL, any such incorrect locality info would fall out of the cache within a bounded period of time, so maybe not a big deal anyway. http://gerrit.cloudera.org:8080/#/c/11227/4//COMMIT_MSG@28 PS4, Line 28: I also tested this by pointing my dev box at a remote filesystem that it's interesitng that there was a "reverse logic" bug in my patch, but I did test it and saw the desired effects. We should make sure to circle back and re-run a similar test to make sure I saw what I thought I was seeing. http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@568 PS4, Line 568: stats = refreshFileMetadata(partDir, Collections.singletonList(partition)); > Question about the implementation of refresh (that code is not in this patc I think Java supports arrays up to 2B elements, but HDFS only supports really up to a few hundred million files. Even so, I think many other parts of Impala are likely to fall over way before this (probably at 100k files in a partition lots of things start to break). I seem to recall that, in terms of the underlying HDFS RPC itself, it already splits it into smaller critical sections of the NS lock on the NN, and maybe even into multiple RPCs. All that to say, I'm not too concerned about this point. http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@371 PS4, Line 371: assertEquals(0L, (long)opsCounts.getLong("op_get_file_block_locations")); > Something is odd. This test did not fail for the "reverse polarity" bug tha This part of the test is for "REFRESH" on the table level -- does that trigger the code path where there was the bug? http://gerrit.cloudera.org:8080/#/c/11227/4/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@379 PS4, Line 379: assertEquals(0L, (long)opsCounts.getLong("op_get_file_block_locations")); this is the one that I'm surprised didn't fail with the bug. Perhaps we can add some log statements or use the debugger to step through the test and understand better what's happening. -- To view, visit http://gerrit.cloudera.org:8080/11227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2051b96599206164aaa06ecbdf64374c46eda956 Gerrit-Change-Number: 11227 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 27 Nov 2018 19:29:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11851 ) Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. Patch Set 4: Code-Review+2 (2 comments) carrying forward http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/kudu_test_suite.py@49 PS3, Line 49: raise NotImplementedError("Multi-master not supported yet") > Nit: might be marginally nicer to raise a NotImplementedError Done http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/skip.py File tests/common/skip.py: http://gerrit.cloudera.org:8080/#/c/11851/3/tests/common/skip.py@106 PS3, Line 106: relies > Nit: misspelling Done -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 27 Nov 2018 19:24:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7790: Skip some Kudu tests if use hybrid clock�
Hello Michael Brown, David Knupp, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11851 to look at the new patch set (#4). Change subject: IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false .. IMPALA-7790: Skip some Kudu tests if use_hybrid_clock=false Since IMPALA-6812, we've run many of our tests against Kudu at the READ_AT_SNAPSHOT scan level, which ensures consistent results. This scan level is only supported if Kudu is run with the flag --use_hybrid_clock=true (which is the default). This patch uses the Kudu master webui to detect when use_hybrid_clock is false and skips these tests. Follow up work will address allowing these tests to run regardless of the value of the flag. Testing: - Ran a full exhaustive build with use_hybrid_clock=false set in the minicluster. Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 --- M bin/impala-config.sh M tests/common/kudu_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_kudu.py M tests/metadata/test_ddl.py M tests/query_test/test_kudu.py 6 files changed, 52 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/11851/4 -- To view, visit http://gerrit.cloudera.org:8080/11851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c9ed4a4ea0720760d65c98acfc394247ab2f1a2 Gerrit-Change-Number: 11851 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: Thanks. -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 19:17:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5973: Provide query plan in JSON format
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11974 ) Change subject: IMPALA-5973: Provide query plan in JSON format .. Patch Set 3: It's a little obscure, but, for running queries, we already have a JSON representation of a plan that can be queried using the Impala web server. It's interesting in that it's a representation not just of the plan, but also of some of the execution statistics of said plan. The code for it is here: // Helper for PlanToJson(), processes a single list of plan nodes which are the // DFS-flattened representation of a single plan fragment. Called recursively, the // iterator parameter is updated in place so that when a recursive call returns, the // caller is pointing at the next of its children. void PlanToJsonHelper(const map& summaries, const vector& nodes, I think it'd be pretty reasonable to always spit this JSON output into the profile at query completion. -- To view, visit http://gerrit.cloudera.org:8080/11974 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f07e2223be58b7323e1ea2fa44b62626cdf4944 Gerrit-Change-Number: 11974 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh (320) Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 19:16:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py@1736 PS3, Line 1736: print_to_stderr('^C') > I've not had the opportunity to run this, but does this diff mean that the No, original intro = '\n' never works anyway since at L1765 in the finally block we always set it to intro = '' -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 19:01:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/11990/3/shell/impala_shell.py@1736 PS3, Line 1736: print_to_stderr('^C') I've not had the opportunity to run this, but does this diff mean that the "intro" is re-printed at every Ctrl-C? -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 27 Nov 2018 18:56:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3497/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 27 Nov 2018 18:40:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 27 Nov 2018 18:40:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3496/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11993 ) Change subject: IMPALA-7895: Incorrect expected results for spillable-buffer-sizing.test .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a Gerrit-Change-Number: 11993 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Nov 2018 18:19:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11995 ) Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1444/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a Gerrit-Change-Number: 11995 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Nov 2018 17:54:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11995 ) Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java: http://gerrit.cloudera.org:8080/#/c/11995/1/fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java@133 PS1, Line 133: protected Function createFunction(FunctionName fnName, List argTypes, Type retType, line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a Gerrit-Change-Number: 11995 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Nov 2018 17:17:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7867 (Part 2): ArrayList cleanup in analyzer
Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11995 Change subject: IMPALA-7867 (Part 2): ArrayList cleanup in analyzer .. IMPALA-7867 (Part 2): ArrayList cleanup in analyzer Follow-on to prior patch for this JIRA. Replaces all use of ArrayList in variable and method declarations with the interface List. Retains the use of ArrayList for list implementations (i.e. "new" statements.) Also replaces "List.newArrayList()" with the more modern "new ArrayList<>()". Cleaned up a few Map and Set declarations and added a few missing @Override annotations found during this process. Tests: This is purely a source-level change; no functional change. Ran all client unit tests. Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a --- M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateUdaStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateUdfStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/FunctionArgs.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/PlanHint.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 35 files changed, 253 insertions(+), 258 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/11995/1 -- To view, visit http://gerrit.cloudera.org:8080/11995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7c0b5f40a0504fc2d324055bd2a962e35f8e744a Gerrit-Change-Number: 11995 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG@12 PS2, Line 12: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC > Maybe you could mention that it stores the number of nanoseconds since the Done http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479 PS2, Line 479: LimitedRange > Dows this "LimitedRange" part of the name add extra value? Am I right that There is already a function with nanosecond precision, and it can create the full range of timestamps: FromUnixTimeNanos(time_t unix_time, int64_t nanos, const Timezone& local_tz); That function could have been used here (after splitting nanoseconds to seconds and subseconds), but I think that it would slower than the specialized solution. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507 PS2, Line 507: /// conversion is necessary. Returns true if the schema describes a valid timestamp > Could you please comment what the parameters are for, pls? Done http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509 PS2, Line 509: ProcessSchema > I feel that something like GetTimestampInfoFromSchema might be a bit more s Done http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc File be/src/exec/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102 PS2, Line 102: // Timestamps can be only encoded as INT64 or INT96. > nit: You might want to adjust this comment to cover the case when this func I am not sure if I have done what you meant. This function should be only called for TIMESTAMP SQL columns, and should return false if the Parquet column is not timestamp. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106 PS2, Line 106: Newer solution, > nit: No need for this part. In addition, it's not guaranteed that reading t Some explanation about "new": LogicalType is often called "new logical type", as converted type was also some kind of logical type. I agree that in 2 years this will not be meaningful. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122 PS2, Line 122: Older solution, > No need for "Older solution" part of the comment. Done http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123 PS2, Line 123: are/were never > nit: I think we shouldn't refer to what happened before this change as we s This code has to handle files written by older versions of Impala/some other tools we try to be compatible with, so I think that referring to he past makes sense here. Not specifying whether utc->local conversion is necessary was a gap in Parquet specification and was interpreted by different software differently. This will not be an issue with the new logical type, but we have to make assumptions with older files. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148 PS2, Line 148: if (e.type == parquet::Type::INT96 && convert_int96_timestamps) needs_conversion = true; > Shouldn't this line be also included in ProcessSchema? Since you introduced I thought that doing it this way is the lesser evil - convert_int96_timestamps is not readily available in validation logic (it depends on a flag and the writer of the file). http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71 PS2, Line 71: return ParquetTimestampDecoder::ProcessSchema(element, precision, needs_conversion); > Just thinking out loud here: This ProcessSchema() is now used for 1) checki Before this change the two functions were separate, but with the addition of logical types the code duplication became too much for me and I tried to merge the bulk of the two functions in ProcessSchema. I think that the ideal way to do this would be to do the whole processing only once per column chunk (currently it is validated once, then processed again for stat filtering if needed, then once again to initialize the column reader). Changing this to use the same processed metadata would be a non-trivial refactor, as most of the code assumes everything needed for decoding can be simply deduced from the physical + SQL types, so handling of more complex types is a bit hacky. http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java File fe/src/main/java/org/apache
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1443/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Nov 2018 16:20:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11984 to look at the new patch set (#3). Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. IMPALA-7853: Add support to read int64 NANO timestamps from Parquet PARQUET-1387 added int64 timestamps with nanosecond precision that stores timestamps as nanoseconds since the Unix epoch. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, this new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be only described with new logical types in Parquet, it has no converted type equivalent. This made implementing CREATE TABLE LIKE PARQUET less trivial than it was for MICRO/MILLI: the type conversion logic in ParquetHelper.java had to be rewritten to use LogicalTypeAnnotation instead of ConvertedType. The changes on Java side also made bumping CDH_BUILD_NUMBER necessary. Testing: - added a new testfile with int64 nano timestamps - ran core tests Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f --- M be/src/exec/parquet-common.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M bin/impala-config.sh M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M testdata/data/README A testdata/data/int64_timestamps_nano.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test M tests/query_test/test_scanners.py 13 files changed, 174 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/11984/3 -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11984/2//COMMIT_MSG@12 PS2, Line 12: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC Maybe you could mention that it stores the number of nanoseconds since the unix epoch. That's why it represents this strange time-range. http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java File fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java: http://gerrit.cloudera.org:8080/#/c/11984/2/fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java@248 PS2, Line 248: LogicalTypeAnnotation logicalType = parquetType.getLogicalTypeAnnotation(); What happens when there is no "logical type" in a Parquet file, but only "converted type"? -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Nov 2018 15:10:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/11990 ) Change subject: IMPALA-7893: Correctly handle Ctrl+C for cancelling a non-running query .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80d7b2c2350224d88d0bfeb1745d9ed76e83cf6d Gerrit-Change-Number: 11990 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 27 Nov 2018 14:32:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7853: Add support to read int64 NANO timestamps from Parquet
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 ) Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479 PS2, Line 479: LimitedRange Dows this "LimitedRange" part of the name add extra value? Am I right that we don't have any other kind of "Timestamp with Nano precision"? Then we can refer to this implementation simply as UnixTimeNanos in my opinion. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507 PS2, Line 507: /// conversion is necessary. Returns true if the schema describes a valid timestamp Could you please comment what the parameters are for, pls? http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509 PS2, Line 509: ProcessSchema I feel that something like GetTimestampInfoFromSchema might be a bit more self-descriptive. There might be even better names :) http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc File be/src/exec/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102 PS2, Line 102: // Timestamps can be only encoded as INT64 or INT96. nit: You might want to adjust this comment to cover the case when this function is called on non-timestamp related types e.g. string. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106 PS2, Line 106: Newer solution, nit: No need for this part. In addition, it's not guaranteed that reading this comment 2 years from now would reveal what "Newer solution" is referred to. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122 PS2, Line 122: Older solution, No need for "Older solution" part of the comment. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123 PS2, Line 123: are/were never nit: I think we shouldn't refer to what happened before this change as we should only talk about the current code not the previous code. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148 PS2, Line 148: if (e.type == parquet::Type::INT96 && convert_int96_timestamps) needs_conversion = true; Shouldn't this line be also included in ProcessSchema? Since you introduced a function to set 'needs_conversion' I think it should tackle all the cases not just a subset of them. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71 PS2, Line 71: return ParquetTimestampDecoder::ProcessSchema(element, precision, needs_conversion); Just thinking out loud here: This ProcessSchema() is now used for 1) checking if the timestamp in the referred element is a valid timestamp and 2) to gather information such as 'populate' and 'precision'. Have you considered splitting these 2 functionalities? -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 27 Nov 2018 14:17:17 + Gerrit-HasComments: Yes