[Impala-ASF-CR] DRAFT PREVIEW: Prereqs for load test system testing
Harrison Sheinblatt has abandoned this change. Change subject: DRAFT PREVIEW: Prereqs for load test system testing .. Abandoned New one submitted -- To view, visit http://gerrit.cloudera.org:8080/3785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ibd330d8a19ae5d968b669a4258cf826d0db28cdd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Harrison Sheinblatt Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4808: old hash join can reference invalid memory .. IMPALA-4808: old hash join can reference invalid memory The bug was that 'probe_rows_exist' could be true even if there was no current probe row. The node can get into this state if it takes the branch at line 390. I tried to reproduce the crash but was unable to after a few attempts. Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Reviewed-on: http://gerrit.cloudera.org:8080/5850 Reviewed-by: Matthew Jacobs Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/hash-join-node.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4808: old hash join can reference invalid memory .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input
anujphadke has posted comments on this change. Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record input .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5800/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: PS1, Line 26: SELECT variance(tinyint_col), stddev(smallint_col),stddev_samp(smallint_col), : variance_pop(int_col), > please add a col for stddev_samp as well. even though they're aliased now, Done -- To view, visit http://gerrit.cloudera.org:8080/5800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record input .. IMPALA-4738: STDDEV_SAMP should return NULL for single record input In calculating the STDDEV_SAMP of N rows a divion by N-1 rows is involved. Hence STDDEV_SAMP for a single row involves a division by 0. This change returns a NULL instead of a 0 when calculating STDDEV_SAMP for a single row. STDDEV_POP for single row will still return a 0 since this does not involve a division by 0. Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 2 files changed, 7 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/5800/2 -- To view, visit http://gerrit.cloudera.org:8080/5800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 9: Michael, yeah, I will fix those. I still need to plug in the additional tests that will be required now that we do constant based stuff. Dan - replace ' ' -> '' was the initial test. This was much faster, 2x so. Regexp compilation starts to be more competitive on longer expressions because it has more advanced state. Still it can't beat a cached StringSearch - 'complicated' actually isn't that complicated, but it (due to the repeated 'c') allows the search to advance faster, better illustrating the difference on longer strings. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 20 files changed, 435 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen
Dan Hecht has posted comments on this change. Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/5732/4//COMMIT_MSG Commit Message: Line 50: defined in the boost library. is there some verification code we can add to ensure that this problem won't happen again (or isn't happening elsewhere)? http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS4, Line 326: which are also defined in the main module this doesn't seem to match what the code is doing. Isn't it materializing functions that are referenced by functions of the new module? But why is this needed? why don't they get materialized when we follow the chain of references in MaterializeFunctionHelper()? http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS4, Line 85: their callee functions what is a "callee" for a global variable? do you mean reference? If so, how about: // Map from global variables/functions to the functions that they reference. PS4, Line 86: CalleeMap again, callee sounds to me like only functions would be keys in this map. FunctionRefsMap? PS4, Line 160: f 'eager' is false, the : /// functions in the module are not materialized. this seems to contradict the class header comment that says "llvm::Function objects in the module are materialized lazily". i.e. the behavior you are documented is the expected behavior, no? So are you adding 'eager' so that the default behavior can be overridden? if so, it should say something like: If 'eager' is true, the functions are materalized. Otherwise they are materialized lazily via GetFunction(). And if that's what's going on, why not call the parameter 'materialize"? But, as a caller, how do I choose whether I'm suppose to materialize or not at this point? The need for this seems to contradict the whole explanation about lazy materialization in the header comment. Finally, rather than threading through this parameter, why not just have the caller do module->MaterializeAll() with the returned module? PS4, Line 617: which also need to be materialized if : /// the input global variable or function is materialized is this really specific to materialization? wouldn't it be enough to just say: ... all the functions referenced by it. The materialization code itself can explain (or be self documenting) about why references need to be followed when materializing. PS4, Line 625: fns_to_materialize_ why this name change? does this now store functions that need to be materialized for various reasons? don't we materialize functions that aren't in this set? http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: since this takes so long to execute, should we either move expr-test.cc to run only in exhaustive, or run only codegen tests when exhaustive? -- To view, visit http://gerrit.cloudera.org:8080/5732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 20 files changed, 435 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4808: old hash join can reference invalid memory .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/224/ -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 20 files changed, 435 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Dan Hecht has posted comments on this change. Change subject: IMPALA-4808: old hash join can reference invalid memory .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants
Dan Hecht has posted comments on this change. Change subject: IMPALA-4809: Add Stability for codegen constants .. Patch Set 1: IMPALA-4809 isn't about inlining an expression related value, it's just that the way the code is currently factored, Expr routines deal with doing codegen tricks involving FunctionContext's. They really don't belong in Expr in the first place. Michael is in the process of cleaning that up. So, let's not make IMPALA-4809 dependent on this change. I agree a concept like this may be useful and needed in the future, but until we finish cleaning up some surrounding code and have a need for this, I don't think we should introduce this new concept (and it seems more of a codegen related concept than an Expr related concept). As an aside, see Codegen::ReplaceCallSites() and other related code for non-expr places we do constant injection. That's another way you could do 4809, but ultimately you'll need to get at the query options through FunctionContext anyway. -- To view, visit http://gerrit.cloudera.org:8080/5848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Dan Hecht has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 9: What do you mean by "complicated" queries? And how much did computing the StringSearch only once help? From your numbers below, REPLACE is not much faster than REGEX_REPLACE, but I thought earlier you were seeing a 2x speedup. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 9: Added support for saving StringSearcher for complicated queries. *** [localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem; Query: select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem Query submitted at: 2017-02-01 03:21:19 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=a49fad015129539:afa30da +-+ | count(regexp_replace(l_comment, 'complicated', '')) | +-+ | 6001215 | +-+ Fetched 1 row(s) in 7.55s [localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem; Query: select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem Query submitted at: 2017-02-01 03:21:31 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=3946a55a95c4529f:67394350 +-+ | count(regexp_replace(l_comment, 'complicated', '')) | +-+ | 6001215 | +-+ Fetched 1 row(s) in 1.54s [localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem; Query: select count(regexp_replace(l_comment, 'complicated', '')) from tpch.lineitem Query submitted at: 2017-02-01 03:21:35 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=3f41480452d08ce1:21ebac35 +-+ | count(regexp_replace(l_comment, 'complicated', '')) | +-+ | 6001215 | +-+ Fetched 1 row(s) in 1.53s [localhost:21000] > select count(replace(l_comment, 'complicated', '')) from tpch.lineitem; Query: select count(replace(l_comment, 'complicated', '')) from tpch.lineitem Query submitted at: 2017-02-01 03:21:44 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=d24d13228c8c8833:110700e3 +--+ | count(replace(l_comment, 'complicated', '')) | +--+ | 6001215 | +--+ Fetched 1 row(s) in 1.33s [localhost:21000] > select count(replace(l_comment, 'complicated', '')) from tpch.lineitem; Query: select count(replace(l_comment, 'complicated', '')) from tpch.lineitem Query submitted at: 2017-02-01 03:21:48 (Coordinator: http://impala-dev:25000) Query progress can be monitored at: http://impala-dev:25000/query_plan?query_id=ea4d6cd631e38ce2:75984bb6 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has uploaded a new patch set (#9). Change subject: IMPALA-4729: Implement REPLACE() .. IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/gutil/bits.h M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup 9 files changed, 201 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/9 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] Prereqs for load test system testing
Harrison Sheinblatt has uploaded a new change for review. http://gerrit.cloudera.org:8080/5851 Change subject: Prereqs for load test system testing .. Prereqs for load test system testing Additional py.test parameters to control load testing, include externally linked test modules in python path, add CmService methods to query timeseries and Impala query data from CM if CM is part of SUT, as well as update admission control configuration parameters. Also added a dependency for pyjavaproperties and updated the version of python_dateutil to the virtual env. Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab --- M bin/impala-python-common.sh M bin/set-pythonpath.sh M infra/python/deps/requirements.txt M tests/comparison/cluster.py M tests/conftest.py 5 files changed, 649 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5851/1 -- To view, visit http://gerrit.cloudera.org:8080/5851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Harrison Sheinblatt
[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants
Michael Ho has posted comments on this change. Change subject: IMPALA-4809: Add Stability for codegen constants .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG Commit Message: PS1, Line 16: overloaded > I think this would be useful once we want to do codegen caching and need to Good point about its usability for codegen caching but it seems a bit premature at this point IMHO. http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h File be/src/exprs/expr.h: PS1, Line 266: RUNTIME_CONSTANT > I think it would make sense to move this into a thrift enum - see my other If this is done solely for codegen, it may make sense to move this to codegen related header. It appears to me the type of constant we are talking about is more like CpuInfo::IsSupported(CpuInfo::SSE4_2) in cross-compiled code. -- To view, visit http://gerrit.cloudera.org:8080/5848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
Alex Behm has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_incompatible_changes.xml File docs/topics/impala_incompatible_changes.xml: Line 63: You can remove this. It was only a temporary bug in 2.8. We never shipped a version that had DESCRIBE FORMATTED in uppercase. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs .. Patch Set 3: @reviewers: Please don't review PS2/3. My next PS includes a bunch of new stuff. I'll send it out for review soon. -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4809: Add Stability for codegen constants .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG Commit Message: PS1, Line 16: overloaded > From the codegen perspective, we only care if a value is compilation time c I think this would be useful once we want to do codegen caching and need to reason about whether codegen output is re-usable. Agree we can solve IMPALA-4809 with Expr::GetConstant() or the mechanism in this patch (if that ever makes it in): https://gerrit.cloudera.org/#/c/3843/ http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 266: RUNTIME_CONSTANT, // Does not change over an execution Concretely, does this mean that it doesn't change within the scope of a process - it can be different if you start another daemon or are on a different system? PS1, Line 266: RUNTIME_CONSTANT > This seems odd to be in Expr class if an example of this class of constants I think it would make sense to move this into a thrift enum - see my other comment about doing the analysis in the FE. Line 267: QUERY_CONSTANT, // Does not change over a single query We may also want to think about how it varies across different Impala daemons. E.g. does QUERY_CONSTANT mean it's the same on all Impala daemons for that query, or just on the current daemon? I think that is effectively a different dimension. I don't think we necessarily need to add more complexity, just that we should document the assumptions. E.g. one thing we've thought about hypothetically is compiling on one daemon then distributing it others. -- To view, visit http://gerrit.cloudera.org:8080/5848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 3: (3 comments) regarding testing, let's discuss that. http://gerrit.cloudera.org:8080/#/c/5768/3/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java: Line 393: long numOutputConstants = 0; int is fine here, if you overflow that we have other problems. Line 402: if ((i - loopStart) % 2 == 1 || (i == children_.size() - 1 && hasElseExpr_)) { check the negation and continue (to save one indentation level). Line 409: if (constLiteralSet.add(outputLiteral)) ++numOutputConstants; nice optimization. is there a test for that? -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
David Knupp has posted comments on this change. Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5640/7/tests/run-tests.py File tests/run-tests.py: PS7, Line 144: other_tests > Maybe use the name explicit_tests ? Done. (And, with hindsight, changed 'params' to 'config_options'.) -- To view, visit http://gerrit.cloudera.org:8080/5640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5640 to look at the new patch set (#8). Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2. .. IMPALA-4735: Upgrade pytest in python env to version 2.9.2. The current version of pytest in the Impala python environment is quite old (2.7.2) and there have been bug fixes in later versions that we could benefit from. Also, since the passing of params to pytest.main() as a string will be deprecated in upcoming versions of pytest, edit run-tests.py to instead pass params as a list. (This also means we don't need to worry about esoteric bash limitations re: single quotes in strings.) While working on this file, the filtering of commandline args when running the verfier tests was made a little more robust. Tested by doing a standard (non-exhaustive) test run on centos 6.4 and ubuntu 14.04, plus an exhaustive test run on RHEL7. Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 --- M infra/python/deps/requirements.txt M tests/run-tests.py 2 files changed, 82 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5640/8 -- To view, visit http://gerrit.cloudera.org:8080/5640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants
Michael Ho has posted comments on this change. Change subject: IMPALA-4809: Add Stability for codegen constants .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG Commit Message: PS1, Line 16: overloaded >From the codegen perspective, we only care if a value is compilation time >constant (compilation in the LlvmCodeGen::FinalizeModule()). Not entirely >clear to me the benefit of distinguishing between these various types of >constant values ? I could be missing something. PS1, Line 17: is_constant() I believe this is a concept confined to Expr only. This is a different notion from say other runtime constants such as the constants in a hash table (e.g. stores_duplicates). http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h File be/src/exprs/expr.h: PS1, Line 266: RUNTIME_CONSTANT This seems odd to be in Expr class if an example of this class of constants is CPUInfo's flags. -- To view, visit http://gerrit.cloudera.org:8080/5848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4808: old hash join can reference invalid memory .. Patch Set 1: Code-Review+1 Yeah, I guess it wasn't harmless after all. I think it makes sense to remove it, thanks. -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4808: old hash join can reference invalid memory .. Patch Set 1: It looks like on the original CR there was some discussion about whether this was right/wrong https://gerrit.cloudera.org/#/c/1068/ -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/5850 Change subject: IMPALA-4808: old hash join can reference invalid memory .. IMPALA-4808: old hash join can reference invalid memory The bug was that 'probe_rows_exist' could be true even if there was no current probe row. The node can get into this state if it takes the branch at line 390. I tried to reproduce the crash but was unable to after a few attempts. Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 --- M be/src/exec/hash-join-node.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/5850/1 -- To view, visit http://gerrit.cloudera.org:8080/5850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants
Zach Amsden has uploaded a new change for review. http://gerrit.cloudera.org:8080/5848 Change subject: IMPALA-4809: Add Stability for codegen constants .. IMPALA-4809: Add Stability for codegen constants Having two boolean parameters to a constructor is error prone, so I thought instead we could express the 'constance' of values by using an enum class. I chose 'Stability' as a name since one can imagine extending this to other semi-stable things in a spectrum between Literal expressed constant, runtime constant, query-local constant, memoizable constexpr, dynamic variable. Right now those terms may be overloaded into a single notion in the sense of is_constant() but they are quite distinct concepts. For example, CPU flags are RUNTIME_CONSTANT, while query options are QUERY_CONSTANT. Testing: Going to run standard test suite. There should be no side effects at all from this change. Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 --- M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/exprs/null-literal.h M be/src/exprs/slot-ref.cc 5 files changed, 38 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5848/1 -- To view, visit http://gerrit.cloudera.org:8080/5848 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/5776/8/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS8, Line 236: (UNLIKELY(result.is_null)) May help to add a comment stating that result can be null if buffer_space exceeds StringVal::MAX_LENGTH at this point. PS8, Line 242: bytes may be this can use a more meaningful name such as unmatched_len. PS8, Line 257: check_space) We want to skip this doubling behavior if bytes_remaining == 0. The code could be slightly simpler if we avoid having check_space and directly check the size of the buffer_space (like line 260). The branch should be pretty predictable for the case in which the replacement string is shorter than the pattern string. PS8, Line 260: replace.len nit: replace.len - pattern.len ? PS8, Line 265: std::numeric_limits::max() / 2); nit: indent 4. PS8, Line 267: ptr - result.ptr Isn't this bytes_produced ? PS8, Line 268: if (UNLIKELY(!result.Resize(context, buffer_space))) { A one line comment about this already handles case in which the buffer_space exceeds StringVal::MAX_LENGTH ? PS8, Line 282: remaining_bytes 'bytes_remaining' for consistency with the code above. PS8, Line 289: context->Reallocate As discussed in person, this will not shrink the buffer given the way FreePool::Reallocate() is implemented. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4359: qgen: add UPSERT support .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/model_translator.py File tests/comparison/model_translator.py: PS1, Line 534: 1) Why 1 here? Are we expecting there to be several "INSERT" keywords in the query sql? http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/query.py File tests/comparison/query.py: Line 705: (CONFLICT_ACTION_DEFAULT, I see that you explained what these mean below, but it's not to perfectly clear to me. Maybe explain what these mean exactly here too? Conflict is when there is a row with the same primary key as you're trying to insert, right? Ignore means insert anyways (so duplicate the primary key?). What is default exactly? PS1, Line 727: ng a K inserting into a Kudu table http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/statement_generator.py File tests/comparison/statement_generator.py: Line 70: if dml_table.primary_keys: Add a comment that if the table has primary keys then it's a Kudu table and Kudu supports UPSERT. (Otherwise upsert is not generated because it's not supported) -- To view, visit http://gerrit.cloudera.org:8080/5795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Improvements to "Unknown disk-ID" warning
Alex Behm has posted comments on this change. Change subject: Improvements to "Unknown disk-ID" warning .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5828/1//COMMIT_MSG Commit Message: Line 7: Improvements to "Unknown disk-ID" warning IMPALA-1427? Line 9: This commit, This patch seems fine. Another idea to consider: I'm not really sure how useful it is to return these missing disk id warnings from scans. An alternative might be to report the number of missing disk ids in the explain plan of a query and throw away this runtime checking and warning through the existing path. -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
John Russell has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 13: (8 comments) New patch set coming any second. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: PS13, Line 64: ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION Add another line here for ADD RANGE PARTITION. (The ADD/DROP lines are already getting split apart as part of a different code review. That will happen here as part of resolving a merge conflict.) PS13, Line 86: kudu_partition_spec Take this out of there because it doesn't apply to the basic ADD PARTITION and DROP PARTITION clauses. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: PS13, Line 85: [PRIMARY KEY (col_name[, ...]] > This doesn't belong here. Done PS13, Line 234: codeblock Sprinkle some rev="kudu" eyecatchers on these Kudu-only codeblocks. PS13, Line 244: CREATE TABLE [IF NOT EXISTS] db_name.]table_name : [COMMENT 'table_comment'] : STORED AS KUDU : [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] : AS : select_statement > This is not correct. You're missing the PRIMARY KEY and PARTITION BY clause Done PS13, Line 411: impala::username.kudu_t1 > nit: I don't think this is a good example. I would be really surprised if s Done http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS13, Line 93: kudu.master_addresses > That's the name of the table property. Can you verify that name of the impa Done. Right, the config option is -kudu_master_hosts PS13, Line 101: kudu.master_addresses Same change to -kudu_master_hosts as in previous paragraph. Clean up the TBLPROPERTIES wording in this paragraph to reflect better the kudu.master_addresses param for that clause. -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/5840 Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Neither Impala nor the Kudu client validates the schema immediately before reading, so Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. This change adds validation after opening a scan token to ensure the projection schema matches the expected schema. The scan is guaranteed to be valid while the KuduScanner is open, even if the schema changes concurrently. Testing: Adds a test case that previously would crash Impala. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/runtime/descriptors.h M common/thrift/generate_error_codes.py M tests/query_test/test_kudu.py 7 files changed, 99 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.
Jim Apple has submitted this change and it was merged. Change subject: Remove "Git repository" from navigation bar. .. Remove "Git repository" from navigation bar. It is not in the navigation bar of any other page; it remained here only by an oversight. Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6 Reviewed-on: http://gerrit.cloudera.org:8080/5785 Reviewed-by: Jim Apple Tested-by: Jim Apple --- M downloads.html 1 file changed, 0 insertions(+), 1 deletion(-) Approvals: Jim Apple: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/5785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.
Jim Apple has posted comments on this change. Change subject: Remove "Git repository" from navigation bar. .. Patch Set 2: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 19 files changed, 425 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/exec/union-node.cc File be/src/exec/union-node.cc: PS2, Line 122: IsInSubplan() > I think we would either pass child_idx_ into isPassThrough() or call it som Done Line 123: > Move this into the Open() branch so we don't execute it unnecessarily. Done Line 126: DCHECK(!IsInSubplan()); > We need to think about the implications of this. This could increase resour Done. Disabled passthrough in subplans. The last row batch of each child is marked with the DeepCopy flag. http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS2, Line 438: Verify > Verify (caps) Done PS2, Line 442: > > Make these references with & to avoid copying the vector. Done PS2, Line 442: const > Don't need std::, we automatically import it in common/names.h. Done -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 19 files changed, 425 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.
Sailesh Mukil has posted comments on this change. Change subject: Remove "Git repository" from navigation bar. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim Apple Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files .. IMPALA-3989: Display skew warning for poorly formatted Parquet files Parquet files are scanned in the granularity of row groups. Each row group belongs to one or more splits and each split is scanned by a separate parquet scanner. If some row groups span multiple splits, then we will most likely end up seeing some scanners having remote reads and some scanners not performing scans at all. This will attribute to skew across the cluster where distribution of scans is uneven. This change adds a counter (NumScannersWithNoReads) to the scan node's runtime profile to track the number of parquet scanners that end up doing no reads becuse of poor formatting. Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Reviewed-on: http://gerrit.cloudera.org:8080/5400 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M tests/query_test/test_scanners.py 3 files changed, 107 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
John Russell has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/5663/1//COMMIT_MSG Commit Message: Line 22: I marked with rev="upstream" some tags > I don't see upstream in any ditaval file. Where is it defined? It's a purely user-defined attribute that we use for bookkeeping purposes, e.g. we use rev="IMPALA-abcd" to indicate the relevant JIRA associated with a change, and rev="upstream" to mark spots requiring future cleanup as part of the Apache cleanup. http://gerrit.cloudera.org:8080/#/c/5663/1/docs/topics/impala_tutorial.xml File docs/topics/impala_tutorial.xml: Line 60: If you already have a environment set up and just need to add Impala to it, follow the installation > This doesn't seem to work. It's showing up as "CDH" in the build. Done PS1, Line 698: from the sample TPC-DS kit for Impala. > The only one of these is Cloudera's and it was very broken last I checked. It may be that this section has to get removed due to bitrot on the TPC-DS stuff. If so, I'll suggest we do that in a subsequent CR and restrict ourselves to getting rid of explicit Cloudera / CDH references in this one. PS1, Line 1700: (Currently, this technique only works for Parquet files.) > Parentheses not necessary. In this case, the parens are like a short form ot the "Note: this technique doesn't work for non-Parquet files" as we might do it in another section. I relax some of the rules a little bit (allowing royal 'we' etc.) in tutorial-style content. -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Hello Laurel Hale, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5663 to look at the new patch set (#2). Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch references to Cloudera and CDH in Impala tutorial There was one tutorial that actually ran under the 'cloudera' user and so repeated that name over and over in directory and HDFS paths. I switched that to 'username'. I suppressed some and tags with Cloudera Manager-specific details. Will physically remove those from the source in a subsequent iteration. I left several instances of audience="Cloudera" because those will be changed to audience="hidden" as part of a separate change request. I marked with rev="upstream" some tags containing impala-shell banners with a Cloudera copyright statement. Will decide on a convention to handle those (elide those lines, or use a conref to consistently substitute the generic equivalent) and do that in a followup patch set. Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae --- M docs/impala_keydefs.ditamap M docs/topics/impala_tutorial.xml 2 files changed, 26 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/5663/2 -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
Impala Public Jenkins has submitted this change and it was merged. Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Reviewed-on: http://gerrit.cloudera.org:8080/5835 Reviewed-by: Lars Volker Reviewed-by: John Russell Tested-by: Impala Public Jenkins --- M docs/impala.ditamap 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, but someone else must approve John Russell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/33/ -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
John Russell has posted comments on this change. Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. Patch Set 1: Code-Review+2 I'll finalize this minor change. -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] Release note updates for Impala 2.8
Greg Rahn has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/5668/3/docs/topics/impala_new_features.xml File docs/topics/impala_new_features.xml: PS3, Line 100: IMPALA-2522 This should be IMPALA-2521. IMPALA-2522 is an Epic that covers much more than just this. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
Lars Volker has posted comments on this change. Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has uploaded a new patch set (#8). Change subject: IMPALA-4729: Implement REPLACE() .. IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/gutil/bits.h M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup 9 files changed, 181 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/8 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has uploaded a new patch set (#7). Change subject: IMPALA-4729: Implement REPLACE() .. IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/gutil/bits.h M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup 9 files changed, 180 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/7 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs
Bharath Vissapragada has uploaded a new patch set (#3). Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs .. IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs Very often we have to change the log4j logging levels of Impalads and Catalog server for debugging purposes. Currently, there is no way to do that without a restart of the daemons, which is not a viable option in production deployments. This patch addresses this supportability gap by exposing the ability to set dynamic log4j logging levels using a simple web endpoint on Impalad and Catalog web pages. Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 --- M be/src/catalog/catalog-server.cc M be/src/service/impala-http-handler.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/logging-support.cc M be/src/util/logging-support.h M common/thrift/Logging.thrift M fe/src/main/java/org/apache/impala/util/GlogAppender.java A www/log_level.tmpl 9 files changed, 226 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/3 -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"
Dan Hecht has posted comments on this change. Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW" .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs
Bharath Vissapragada has uploaded a new patch set (#2). Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs .. IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs Very often we have to change the log4j logging levels of Impalads and Catalog server for debugging purposes. Currently, there is no way to do that without a restart of the daemons, which is not a viable option in production deployments. This patch addresses this supportability gap by exposing the ability to set dynamic log4j logging levels using a simple web endpoint on Impalad and Catalog web pages. Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 --- M be/src/catalog/catalog-server.cc M be/src/service/impala-http-handler.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/logging-support.cc M be/src/util/logging-support.h M common/thrift/Logging.thrift M fe/src/main/java/org/apache/impala/util/GlogAppender.java A www/log_level.tmpl 9 files changed, 228 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/2 -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs .. Patch Set 1: (6 comments) Thanks Alex for the reviews. Very helpful. I agree this needs some discussion on usability aspect. PS2 made some changes. 1. Changed the title to "Java log level (log4j)" to make it look more non-technical. 2. The log level input is now a drop down to make things more clearer to the end user. 3. It now has an example in the text box itself. Looks like this [1] 4. Regarding point (6), I'm not totally sure if we should something about it, I've added a comment in the CR. May be we can discuss it further. 5. Still working on a single button to restore log levels. @Henry: Still looking into the glog dynamic changes. Looks like it is possible and is as simple as FLAGS_v = , but I'm still digging into it. [1] http://www.dumpt.com/img/viewer.php?file=8undlklgs7wt9q939hq9.png http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc File be/src/util/logging-support.cc: Line 26: #include "rpc/jni-thrift-util.h" > move before common/names.h Moved. For my understanding, whats special about names.h include? http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.h File be/src/util/logging-support.h: Line 21: #include "rapidjson/rapidjson.h" > I think we should be able to forward declare instead of include. Done. This one has some nested typedefs with some dependencies. Let me know if you think we should still retain it. http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift File common/thrift/Logging.thrift: Line 34: // Helper structs for GetLogLevel and SetLogLevel methods > GetLogLevel() and SetLogLevel() Done Line 35: struct TGetLogLevelParams { > Instead of this GetLogLevel() I think we should instead list all custom log Per my understanding log4j doesn't provide any easy way to get that list. So we need to manually track all the classes that were overriden via web UI. IMO that is some additional book keeping overhead on the Catalog server and some additional lines of code for a feature we don't often use unless we are debugging something. Thoughts? Line 41: > remove blank line Done http://gerrit.cloudera.org:8080/#/c/5792/1/fe/src/main/java/org/apache/impala/util/GlogAppender.java File fe/src/main/java/org/apache/impala/util/GlogAppender.java: Line 155:* Get the log4j log level corresponding to a seriazlied TGetLogLevelParams. > typo: serialized Done -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 266: UNLIKELY(check_space)) > Well it was unlikely when I over-allocated the space. Now it is still prob Yes. I usually leave UNLIKELY() to error cases only. The general consensus (from my past experience) is that we use LIKELY/UNLIKELY on critical paths only. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Alex Behm has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. Patch Set 23: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 266: UNLIKELY(check_space)) > Not sure if UNLIKELY() is really unlikely here. Isn't it dependent on the i Well it was unlikely when I over-allocated the space. Now it is still probably unlikely, since many replaces either will be stripping characters or replacing equal size strings. Doesn't really matter much I suppose. Should I remove it? PS5, Line 273: DCHECK( > DCHECK_EQ(). will fix. originally was using math on the ptr val and DCHECK_EQ didn't like the types involved. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has uploaded a new patch set (#6). Change subject: IMPALA-4729: Implement REPLACE() .. IMPALA-4729: Implement REPLACE() This turned out to be slightly non-trivial as REPLACE is already a keyword, and thus the parser needs to be tweaked to allow this, since function names act as bare identifiers. Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M be/src/gutil/bits.h M be/src/udf/udf-internal.h M be/src/udf/udf.cc M be/src/udf/udf.h M common/function-registry/impala_functions.py M fe/src/main/cup/sql-parser.cup 9 files changed, 171 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/6 -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { > This is why I was asking earlier about what is considered a small / large m Sounds good. PS5, Line 236: haystack.len + replace.len - pattern.len > max string size is 1 << 30 Oops. Bad example. I suppose line 244 below will catch the case in which the buffer space exceeds StringVal::MAX_LENGTH. PS5, Line 266: UNLIKELY(check_space)) Not sure if UNLIKELY() is really unlikely here. Isn't it dependent on the inputs ? PS5, Line 272: buffer_space <<= 1; Please see comments in StringVal::Resize(). I think this can potentially go to 1 << 31 but it should be fine as long as StringVal::Resize() catches it. May want to DCHECK_LE(buffer_space, 1ULL << 30); PS5, Line 273: DCHECK( DCHECK_EQ(). http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/udf/udf.cc File be/src/udf/udf.cc: PS5, Line 519: auto* new_ptr = ctx->impl()->ReallocateLocal(ptr, new_len); This should return StringVal::null() if new_len exceeds StringVal::MAX_LENGTH. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Dimitris Tsirogiannis has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 13: John, when do you plan to post a new patch that addresses the last comments? I think after that, we should be able to +2 it. -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/5835 Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. .. [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order. Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 --- M docs/impala.ditamap 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/5835/1 -- To view, visit http://gerrit.cloudera.org:8080/5835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 127: // statistics behavior from any implicit behavior of the types? > If Parquet's and Impala's ordering were "roughly the same", then we would n My concern is that min()/max() use overloaded operators, e.g. like the ones in timestamp-value.h - so any changes or additions get automatically picked up here via an implicit mechanism. It seems like it would make it easy for someone to accidentally break our Parquet implementation by changing a seemingly-unrelated bit of code. I think the static check above helps a bit but my instinct is to be as explicit as possible. Another way I was thinking about it is that the conceptual distinction between Parquet's ordering and ours is important and the code should reflect that in some way. We'll have to see what happens with parquet-format. I agree that it makes things tricky if the ordering diverges at all. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { > Did you measure the performance gain for that ? Any hardcoded bound still l This is why I was asking earlier about what is considered a small / large memory size in Impala. How about we just start with the next power of two above something that can fit one replacement? PS5, Line 236: haystack.len + replace.len - pattern.len > Wouldn't this be negative if haystack.len == (1 << 31) - 4 and replace.len max string size is 1 << 30 PS5, Line 240: buffer_space = out_max; > an excellent idea But makes the test -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
Jim Apple has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml File docs/topics/impala_incompatible_changes.xml: Line 71: and query options (such as V_CPU_CORES) remain but do not have any effect. > OK. I wasn't sure in the race condition between "Reply" and "upload new pat They are usually done within seconds of each other. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
Jim Apple has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/5668/7/docs/shared/impala_common.xml File docs/shared/impala_common.xml: PS7, Line 3297: CDH Please do not add new CDH references. http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_new_features.xml File docs/topics/impala_new_features.xml: PS7, Line 724: New spaces at end of lines -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
John Russell has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml File docs/topics/impala_incompatible_changes.xml: Line 71: and query options (such as V_CPU_CORES) remain but do not have any effect. > It is difficult to review changes when the author replies "Done" before the OK. I wasn't sure in the race condition between "Reply" and "upload new patch set" which should come first. In a different review I got a request not to upload the patch set before the responses. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { > I dropped this down to 64k. Seems like the benefit for not re-allocating i Did you measure the performance gain for that ? Any hardcoded bound still looks a bit arbitrary unless there is justification for choosing it. It'd be nice to keep the decision simple while still getting reasonable speedup over regex_replace(). Starting with the next power of 2 of the current string length will work well with how FreePool allocates memory internally. PS5, Line 236: haystack.len + replace.len - pattern.len > should not be able to overflow. preferred indent is Wouldn't this be negative if haystack.len == (1 << 31) - 4 and replace.len = 6 and pattern.len = 2 ? Correct. Indentation is 4 for line continuation. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 326: LOG(INFO) << "Failed to get reservation for " << pages_.front().len(); I need to remove this log message. http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 100: if (ExecEnv::GetInstance()->buffer_pool() != nullptr) { Use exec-env directly. http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/query-state.h File be/src/runtime/query-state.h: Line 29: #include "runtime/tmp-file-mgr.h" // TODO: can we avoid this? Remove TODO Line 121: Status CheckSpillingEnabled(); This is currently dead code - can remove it. http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 442: /// TODO: ownership semantics Need to fix this TODO -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
John Russell has uploaded a new patch set (#7). Change subject: Release note updates for Impala 2.8 .. Release note updates for Impala 2.8 First cut at 'new features' topic. Includes the Incompatible Changes subtopic for Impala 2.8. Also did some cleanup throughout the Incompatible Changes page: - Took out references to Cloudera release numbers from titles. - Suppressed the display of ancient subtopics from the Impala beta days, which are intertwined with things like what version of Cloudera Manager was supported. Patch set 3: More on MT_DOP for COMPUTE STATS. Address comments from Greg and MJ. Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed --- M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_new_features.xml 2 files changed, 457 insertions(+), 177 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5668/7 -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus
[Impala-ASF-CR] [DOCS] Apply conditionalization to tags.
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/5834 Change subject: [DOCS] Apply conditionalization to tags. .. [DOCS] Apply conditionalization to tags. Housekeeping change. Apply same conditional settings for as for corresponding for some pages that are hidden or otherwise conditionalized. Doesn't actually have an effect on the DITA-OT HTML output, but provides some future-proofing if we did more elaborate linking in the future. Change-Id: Ia9f0517599b6cab77eaca95b9c7ce7262583dfb9 --- M docs/impala_keydefs.ditamap 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/5834/1 -- To view, visit http://gerrit.cloudera.org:8080/5834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia9f0517599b6cab77eaca95b9c7ce7262583dfb9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Jim Apple has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc File .pep8rc: PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign. : # E301 - Add missing blank line. > Yes, that was my intention, too. Such a tool sounds very useful. Is it in a https://github.com/jbapple-cloudera/clang-format-infer It uses hill-climbing with random restarts to search the state space. The config is pretty brittle, I must admit, but I think that's mainly a clang problem, so you might be able to repurpose it. -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Lars Volker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 9: (8 comments) Thanks for the review. Please see my inline comments and PS11. http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 339: int64_t encoded_value_size_; > sounds good Done http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 139: } > why not just pass in metadata->statistics and then set the __isset flag by Done Line 652: // Add the size of the data page header > avoid copy by passing in header.data_page_header.statistics Done http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 103: /// Maximum statistics size. If the combined size of the min and max values of > qualify as 'parquet.Statistics' so it's clearer Done. I used :: since that is the class name in its namespace. Do you prefer "."? http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 127: // statistics behavior from any implicit behavior of the types? > i understand how that may not be the case today, but in order for them to b If Parquet's and Impala's ordering were "roughly the same", then we would need some translation between our min values and the ones in Parquet. For our current types, I don't see that as a problem either, but I think Tim was concerned about adding types in the future and preventing potential bugs. I'll let Tim add his thoughts to the discussion, personally I'm good with using min/max for now. The comment was there to facilitate this discussion, since it came up in reviews of previous patch sets. I will remove it. http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 84: > remove Without these, clang-format will undo all manual changes to the style on lines modified by this change. I added it as a TODO to the commit message to remove those once the change has a +2, when I will have to rebase it anyways. Line 87: class ColumnStats : public ColumnStatsBase { > indent the subsequent lines belonging to the logical expr two more spaces ( Done Line 157: /// Returns the number of bytes needed to encode value 'v'. > this is very verbose. why needed? See my previous comment. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5611 to look at the new patch set (#11). Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. IMPALA-3909: Populate min/max statistics in Parquet writer TODO after +2 and before merging/rebasing: - Remove clang-format markers in the code - Replace NULL with nullptr in modified cc files Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h A be/src/exec/parquet-column-stats.h M be/src/exec/parquet-common.h M be/src/runtime/string-value.h M be/src/util/dict-test.cc M tests/query_test/test_insert_parquet.py M tests/util/get_parquet_metadata.py 8 files changed, 760 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/11 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/223/ -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Lars Volker has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc File .pep8rc: PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign. : # E301 - Add missing blank line. > I can't speak for Lars, but I worked on our initial .clang-format, and I tr Yes, that was my intention, too. Such a tool sounds very useful. Is it in any way re-usable and would you be able to share it? -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Lars Volker has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc File .pep8rc: PS1, Line 2: # E101 - Reindent all lines. > I moderately disagree with this being ignored. Can you talk about your reas Without E101 and E111, autopep8 will re-indent the whole file to use 4 spaces instead of 2, and I couldn't find a way around this. https://github.com/hhatto/autopep8/blob/master/autopep8.py#L3095 PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign. : # E301 - Add missing blank line. > I strongly disagree with these being ignored. Can you talk about your reaso I tried to make this match the style that I found, but I don't feel strongly about it and would be glad if we could get as close to PEP8 as possible. I'll comment more on this on dev@. http://gerrit.cloudera.org:8080/#/c/5829/1//COMMIT_MSG Commit Message: PS1, Line 7: Add .pep8rc for Impala's Python style > What tools get affected by the presence of this? Can it stay in the top-lev No tools are automatically affected by this, but several can use it if pointed to it with a command line flag. Should we point this out here or would the wiki page be sufficient? -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Zach Amsden has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 211: pattern.is_null > I believe pattern.is_null should also return StringVal::null(). pattern.is_ Will change this semantic. PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { > It seems a bit arbitrary to choose 256K. Why not just start with a buffer o I dropped this down to 64k. Seems like the benefit for not re-allocating is pretty high and we only want to pay costs on very large replaces, which is what we'll get. PS5, Line 236: haystack.len + replace.len - pattern.len > I believe this can still overflow a 32-bit value, right ? Btw, please keep should not be able to overflow. preferred indent is in 4 and continuing, correct? PS5, Line 240: buffer_space = out_max; > May help to have a test case in which the result string is longer than Stri an excellent idea PS5, Line 249: while (match_pos + needle.len <= haystack.len) { : // Copy in original string : const int bytes = match_pos - consumed; : memcpy(ptr, &haystack.ptr[consumed], bytes); : DCHECK_LE(ptr - result.ptr + bytes, buffer_space); : ptr += bytes; : : // Copy in replacement - always safe since we always leave room for one more replace : DCHECK_LE(ptr - result.ptr + replace.len, buffer_space); : memcpy(ptr, replace.ptr, replace.len); : ptr += replace.len; : : // Don't want to re-match within already replaced pattern : match_pos += needle.len; : consumed = match_pos; : : // If we had an enlarging pattern, we may need more space : if (UNLIKELY(check_space)) { : const int bytes_produced = ptr - result.ptr; : const int bytes_remaining = haystack.len - consumed; : if (UNLIKELY(bytes_produced + bytes_remaining + replace.len > buffer_space)) { : // Ran out of space, double the size. See the note above regarding the choice : // of a power of two sized buffer. : buffer_space <<= 1; : DCHECK((buffer_space & (buffer_space - 1)) == 0); : const auto ofs = ptr - result.ptr; : if (UNLIKELY(!result.Resize(context, buffer_space))) { : return StringVal::null(); : } : ptr = result.ptr + ofs; : } : } : : StringValue haystack_substring = haystack.Substring(match_pos); : int match_pos_in_substring = search.Search(&haystack_substring); : if (match_pos_in_substring < 0) break; : match_pos += match_pos_in_substring; : } > Please ignore my previous comment. We need to keep the memory allocations i I looked at StringBuffer but it uses MemPool directly. I'd rather keep this change locally confined to using the current allocator rather than adapting a new interface. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 249: while (match_pos + needle.len <= haystack.len) { : // Copy in original string : const int bytes = match_pos - consumed; : memcpy(ptr, &haystack.ptr[consumed], bytes); : DCHECK_LE(ptr - result.ptr + bytes, buffer_space); : ptr += bytes; : : // Copy in replacement - always safe since we always leave room for one more replace : DCHECK_LE(ptr - result.ptr + replace.len, buffer_space); : memcpy(ptr, replace.ptr, replace.len); : ptr += replace.len; : : // Don't want to re-match within already replaced pattern : match_pos += needle.len; : consumed = match_pos; : : // If we had an enlarging pattern, we may need more space : if (UNLIKELY(check_space)) { : const int bytes_produced = ptr - result.ptr; : const int bytes_remaining = haystack.len - consumed; : if (UNLIKELY(bytes_produced + bytes_remaining + replace.len > buffer_space)) { : // Ran out of space, double the size. See the note above regarding the choice : // of a power of two sized buffer. : buffer_space <<= 1; : DCHECK((buffer_space & (buffer_space - 1)) == 0); : const auto ofs = ptr - result.ptr; : if (UNLIKELY(!result.Resize(context, buffer_space))) { : return StringVal::null(); : } : ptr = result.ptr + ofs; : } : } : : StringValue haystack_substring = haystack.Substring(match_pos); : int match_pos_in_substring = search.Search(&haystack_substring); : if (match_pos_in_substring < 0) break; : match_pos += match_pos_in_substring; : } > Is this more performant than simply using string::append() or string += ope Please ignore my previous comment. We need to keep the memory allocations in a freepool. Ideally, we should use StringBuffer if possible but that interface works with MemPool only. May be we can consider adapting its interface to take a Resize() function pointer instead so it can be shared by both FreePool and MemPool. -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Jim Apple has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc File .pep8rc: PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign. : # E301 - Add missing blank line. > I strongly disagree with these being ignored. Can you talk about your reaso I can't speak for Lars, but I worked on our initial .clang-format, and I tried to make it match our existing style as closely as possible. I even wrote a little tool to explore the state space of config options and find the ones that match our codebase with as small a diff as possible. As a result of this matching, the options are a bit odd, but it will hopefully encourage new code to match the old style as closely as possible, which might be easier to read than having new code be in the accepted pep8 style and old code in the idiosyncratic Impala style. -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC .. Patch Set 4: (3 comments) looks good, but let's talk about the nscd requirement today. http://gerrit.cloudera.org:8080/#/c/5720/4/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: Line 166:.ok()); > Not that gets propagated into our Status object - there is room for improve is there a todo somewhere? http://gerrit.cloudera.org:8080/#/c/5720/5/be/src/rpc/rpc.h File be/src/rpc/rpc.h: Line 49: template move them into the class so they're naturally qualified? http://gerrit.cloudera.org:8080/#/c/5720/4/be/src/statestore/statestore-test.cc File be/src/statestore/statestore-test.cc: Line 379: subscribers_[0]->WaitForUpdates(1); ? -- To view, visit http://gerrit.cloudera.org:8080/5720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Michael Brown has posted comments on this change. Change subject: Add .pep8rc for Impala's Python style .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc File .pep8rc: PS1, Line 2: # E101 - Reindent all lines. I moderately disagree with this being ignored. Can you talk about your reasonings for wanting to ignore it? PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign. : # E301 - Add missing blank line. I strongly disagree with these being ignored. Can you talk about your reasonings for wanting to ignore them? http://gerrit.cloudera.org:8080/#/c/5829/1//COMMIT_MSG Commit Message: PS1, Line 7: Add .pep8rc for Impala's Python style What tools get affected by the presence of this? Can it stay in the top-level Impala directory, or do I need to copy it to my homedir? -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS6, Line 178: ProcessValue > I tend to favor ProcessValue, since Append (like Encode) leaves out the sta either way. i don't think consume communicates more than process. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 136: if (row_group_stats_base_->has_values() oops, i missed that bug Line 339: > Yes, the dictionary encoders use it as the size of the encoded values. Shou sounds good http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 139: row_group_stats_base_->EncodeToThrift(&encoded); why not just pass in metadata->statistics and then set the __isset flag by hand? Line 652: page_stats_base_->EncodeToThrift(&encoded); avoid copy by passing in header.data_page_header.statistics http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 103: /// Maximum statistics size. If the size of a single thrift Statistics struct for a page qualify as 'parquet.Statistics' so it's clearer http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 127: } > The stats should follow the ordering semantics defined in parquet-format. T i understand how that may not be the case today, but in order for them to be useful, they need to mirror the ordering semantics of the underlying type. meaning i still don't see why there should be stats-specific comparisons and why we want this todo. http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 84: // clang-format off remove Line 87: || std::is_same::value indent the subsequent lines belonging to the logical expr two more spaces (style guide) Line 157: // clang-format off this is very verbose. why needed? -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Attila Jeges has uploaded a new patch set (#23). Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION Just like Hive, Implala should support multiple partitions in ALTER TABLE ADD PARTITION statements. The syntax is as follows: ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec1 [location_spec1] [cache_spec1] PARTITION partition_spec2 [location_spec2] [cache_spec2] ... Grammar was modified to handle the new syntax. Introduced PartitionDef class to capture the repeatable part of the statement. TPartitionDef is the name of the corresponding thrift class. AlterTableAddPartitionStmt and CatalogOpExecutor classes were also modified to work with a list of partitions. Duplicate partition specs are rejected in AlterTableAddPartitionStmt.analyze(). Added FE, E2E and integration tests. Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java A fe/src/main/java/org/apache/impala/analysis/PartitionDef.java M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/PartitionSpecBase.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test M tests/common/impala_test_suite.py M tests/metadata/test_hms_integration.py M tests/metadata/test_refresh_partition.py 17 files changed, 858 insertions(+), 207 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/4144/23 -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4144 to look at the new patch set (#23). Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION Just like Hive, Implala should support multiple partitions in ALTER TABLE ADD PARTITION statements. The syntax is as follows: ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec1 [location_spec1] [cache_spec1] PARTITION partition_spec2 [location_spec2] [cache_spec2] ... Grammar was modified to handle the new syntax. Introduced PartitionDef class to capture the repeatable part of the statement. TPartitionDef is the name of the corresponding thrift class. AlterTableAddPartitionStmt and CatalogOpExecutor classes were also modified to work with a list of partitions. Duplicate partition specs are rejected in AlterTableAddPartitionStmt.analyze(). Added FE, E2E and integration tests. Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java A fe/src/main/java/org/apache/impala/analysis/PartitionDef.java M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/PartitionSpecBase.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test M tests/common/impala_test_suite.py M tests/metadata/test_hms_integration.py M tests/metadata/test_refresh_partition.py 17 files changed, 858 insertions(+), 207 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/4144/23 -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Attila Jeges has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. Patch Set 22: (10 comments) http://gerrit.cloudera.org:8080/#/c/4144/22/fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java: Line 92: throw new AnalysisException( > Add a test for this in AnalyzeDDLTest Done http://gerrit.cloudera.org:8080/#/c/4144/22/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS22, Line 221: Used when persisting the results of COMPUTE STATS statements. > Update the comment (now it is used for the add partition statement) Done Line 1859:* 2. If a partition exists in HMS but not in catalog cache, reload partition from HMS. > mention that caching directives are only applied to new partitions that wer Done Line 1861: private Table alterTableAddPartitions(Table tbl, > Since there are some interesting details with existing partitions (in cache Filed IMPALA-4844 http://gerrit.cloudera.org:8080/#/c/4144/22/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 1023: # IMPALA-1670: Cannot add duplicate partition specs. > remove, this is covered by an analyzer test Done Line 1033: # IMPALA-1670: Cannot add duplicate partition specs, not even if 'IF NOT EXISTS' is used. > remove, this is covered by an analyzer test Done Line 1056: # IMPALA-1670: If 'IF NOT EXISTS' is not used, ALTER TABLE ADD PARTITION cannot add a > This should be covered by an analyzer test instead Done Line 1066: # IMPALA-1670: If adding one partition fails, the entire statement fails. > Not really accurate. There are also modes of partial failure/success. If an Removed this section. The test case between L1056-1063 was moved to AnalyzeDDLTest. Line 1150: # IMPALA-1670: After REFRESH Impala can access previously added partitions and partition > Seems a little excessive to me, not sure if we need this test. I agree, we don't need to test the same scenario both with INVALIDATE METADATA and REFRESH. Removed this section. http://gerrit.cloudera.org:8080/#/c/4144/22/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: Line 242: # IMPALA-1670: Revoke privileges, so that user would not have privileges to run ALTER > I agree. Not all of them are needed. I'd like to keep the ones where we gra Done -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4617: remove IsConstant() analysis from be .. IMPALA-4617: remove IsConstant() analysis from be This change avoids the need to duplicate the logic in Expr.getConstant() in the frontend and Expr::GetConstant() in the backend. Instead it is plumbed through from the frontend. To make it easier to reason about the state of isAnalyzed_ and isConstant_, made isAnalyzed_ private and refactored analyze() so that isAnalyzed_ is only set at the end of analysis, not in the middle of it. I considered going further and only allowing isConstant() to be called only on analyzed expressions, but there are multiple places in the code that depend on this working in non-trivial ways, so that would be a lot more invasive. There should be no behavioural changes as a result of this patch, aside from a minor fix for "limit count(*)" where an internal error message was produced instead of the expected one about constant expressions. Testing: Ran exhaustive tests. Added a regression test for "limit count(*)". Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90 Reviewed-on: http://gerrit.cloudera.org:8080/5415 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-table-sink.cc M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/literal.cc M be/src/exprs/null-literal.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/tuple-is-null-predicate.h M common/thrift/Exprs.thrift M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java M fe/src/main/java/org/apache/impala/analysis/LimitElement.java M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 38 files changed, 202 insertions(+), 218 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4617: remove IsConstant() analysis from be .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()
Michael Ho has posted comments on this change. Change subject: IMPALA-4729: Implement REPLACE() .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: PS5, Line 211: pattern.is_null I believe pattern.is_null should also return StringVal::null(). pattern.is_null == true means pattern is "undefined". PS5, Line 231: int64_t out_max = match_pos + replace.len * ((haystack.len - match_pos) / pattern.len); This may potentially result in a huge overestimation if there are few matches and the replacement string is a lot longer than pattern string. PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) { It seems a bit arbitrary to choose 256K. Why not just start with a buffer of about the same length as the original string and keep appending to it as we find matches ? PS5, Line 236: haystack.len + replace.len - pattern.len I believe this can still overflow a 32-bit value, right ? Btw, please keep the line to be within 90 characters. PS5, Line 240: buffer_space = out_max; May help to have a test case in which the result string is longer than StringVal::MAX_LENGTH and verifies that we actually get StringVal::null() back. PS5, Line 249: while (match_pos + needle.len <= haystack.len) { : // Copy in original string : const int bytes = match_pos - consumed; : memcpy(ptr, &haystack.ptr[consumed], bytes); : DCHECK_LE(ptr - result.ptr + bytes, buffer_space); : ptr += bytes; : : // Copy in replacement - always safe since we always leave room for one more replace : DCHECK_LE(ptr - result.ptr + replace.len, buffer_space); : memcpy(ptr, replace.ptr, replace.len); : ptr += replace.len; : : // Don't want to re-match within already replaced pattern : match_pos += needle.len; : consumed = match_pos; : : // If we had an enlarging pattern, we may need more space : if (UNLIKELY(check_space)) { : const int bytes_produced = ptr - result.ptr; : const int bytes_remaining = haystack.len - consumed; : if (UNLIKELY(bytes_produced + bytes_remaining + replace.len > buffer_space)) { : // Ran out of space, double the size. See the note above regarding the choice : // of a power of two sized buffer. : buffer_space <<= 1; : DCHECK((buffer_space & (buffer_space - 1)) == 0); : const auto ofs = ptr - result.ptr; : if (UNLIKELY(!result.Resize(context, buffer_space))) { : return StringVal::null(); : } : ptr = result.ptr + ofs; : } : } : : StringValue haystack_substring = haystack.Substring(match_pos); : int match_pos_in_substring = search.Search(&haystack_substring); : if (match_pos_in_substring < 0) break; : match_pos += match_pos_in_substring; : } Is this more performant than simply using string::append() or string += operator ? -- To view, visit http://gerrit.cloudera.org:8080/5776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes