[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: I added a new query option ENABLE_EXPR_REWRITES and set it to false for expr-test.cc I investigated the NULL type change and it turns out there is no easy way to preserve the type with out current approach because after rewriting we wipe all analysis state and re-analyze fresh. At that point, there is no way for us to know what type a NullLiteral should have. I don't see any harm in it. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4877 to look at the new patch set (#5). Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Adds a new query option ENABLE_EXPR_REWRITES to enable/disable non-essential expr rewrites in the FE. Note that some rewrites are required, e.g., BetweenToCompoundRule. Disabling the rewrites is useful for testing, in particular, to make sure that the exprs specified in expr-test.cc are executed as written. Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad --- M be/src/exprs/expr-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 17 files changed, 281 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/5 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable
Alex Behm has posted comments on this change. Change subject: IMPALA-3552: Make incremental stats max serialized size configurable .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4867/7/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: Line 151: Status Catalog::GetCatalogConfigBytes(JNIEnv* jni_env, jbyteArray* cfg_bytes) const { > Makes sense, but since impalad/catalogd are the same binary they can see al Or maybe even add a new file for this like backend-config.h in util. -- To view, visit http://gerrit.cloudera.org:8080/4867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yonghyun HwangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: Yonghyun Hwang Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation
Alex Behm has posted comments on this change. Change subject: Preview: IMPALA-4363: Add timestamp validation .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 473: if (NeedsValidation() && !ValidateSlot(val_ptr).ok()) { > single line if it fits I don't think this will behave quite right with abort_on_error=false. Returning false here will terminate the scan immediately. Line 599: return Status::OK(); > It's set to NULL in the implementation of Setting a slot to NULL means flipping the null bit in the containing tuple. I don't think you're doing that from within TimestampValue::Validate(), and it wouldn't make sense to do it there. Looking a little closer at the code, I think we may have to move this check into ReadValue() instead of ReadSlot(). We need to do the following if validation fails: tuple->SetNull(null_indicator_offset_); -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Improve Kudu UPSERT test coverage
Matthew Jacobs has posted comments on this change. Change subject: Improve Kudu UPSERT test coverage .. Patch Set 2: (7 comments) Thanks! This is great. I didn't look as closely as I did in revision 1, but I'll check again tomorrow and but can probably give a +1. I'll see if anyone wants to take a look at the python code. Otherwise I'll give it a +2 after I look again. http://gerrit.cloudera.org:8080/#/c/4953/2//COMMIT_MSG Commit Message: PS2, Line 9: lease release ... though I guess it seems odd to _re_ lease the Impala Kudu integration which has never been leased before (at least not GA). PS2, Line 10: 5.10 2.8 PS2, Line 9: In preparation for the public lease of Kudu integration in : 5.10, we need to make sure that we've covered as much of the : Kudu related functionality with tests as possible. This patch : covers UPSERT. Not really needed in the commit. PS2, Line 14: It also introduces a new test section 'DML_RESULTS', which : takes the name of a table as a comment and the contents of the : table as its body and then verifies that the body matches the : actual contents of the table. This makes it easy to check that a : DML operation has the desired effect on the contents of a table, : rather than always having to add another test case that runs a : select on the table. nice http://gerrit.cloudera.org:8080/#/c/4953/1/testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test: Line 433 > How about adding the 'LABELS' section? That's great, I didn't know it existed. PS1, Line 680: : : : > Good point - there appears to only actually be 10 distinct values of int_co Ah, makes sense. Thanks for looking into that, I think it's fine. We can think about changing the wording in the future (not something to pile on now). http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: PS2, Line 356: select * from %s might be good to put a limit on here for safety, e.g. the number of rows specified in the "expected rows" section, or even just some big constant like 1000. Even if it's kinda hacky/random, it'd be crazy to have a test case where the DML_RESULTS section even got close but would prevent the test from trying to do something dumb. -- To view, visit http://gerrit.cloudera.org:8080/4953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4266: Java udf returning string can give incorrect results .. IMPALA-4266: Java udf returning string can give incorrect results The memory management of string results was wrong: strings returned from Exprs must live until the next time FreeLocalAllocations() is called. Otherwise the buffer holding the string is freed or reused by the next UDF call. The fix is to copy string values into a buffer with the right lifetime. Testing: Added a regression test based on Bharath's example that reproduced the bug reliably. Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Reviewed-on: http://gerrit.cloudera.org:8080/4941 Reviewed-by: Tim ArmstrongTested-by: Internal Jenkins --- M be/src/exprs/hive-udf-call.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test A tests/test-hive-udfs/src/main/java/org/apache/impala/ReplaceStringUdf.java 4 files changed, 70 insertions(+), 1 deletion(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4266: Java udf returning string can give incorrect results .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable
Bharath Vissapragada has uploaded a new patch set (#7). Change subject: IMPALA-3552: Make incremental stats max serialized size configurable .. IMPALA-3552: Make incremental stats max serialized size configurable The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation on the maximum serialized size of incremental stats. As a side effect, some users with very large tables are experiencing regressions especially when they upgrade impala and the serialized size goes beyond 200MB. To mitigate the issue, the change introduces a new gflag, 'inc_stats_size_limit_bytes' to make the max serialized size configurable, which allows impala users to specify their own maximum serialized size. Default value for inc_stats_size_limit_bytes is 200MB. The change amends the TBackendConfig class to pass the gflags from backend to the Frontend and the Catalog via thrift. This also revamps existing query options to use the TBackendConfig. Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 --- M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/common/global-flags.cc M be/src/service/fe-support.cc M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Frontend.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/authorization/User.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java 21 files changed, 195 insertions(+), 163 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/4867/7 -- To view, visit http://gerrit.cloudera.org:8080/4867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yonghyun HwangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: Yonghyun Hwang
[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3872: allow providing PyPi mirror for python packages .. Patch Set 6: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/442/ -- To view, visit http://gerrit.cloudera.org:8080/4770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable
Bharath Vissapragada has uploaded a new patch set (#6). Change subject: IMPALA-3552: Make incremental stats max serialized size configurable .. IMPALA-3552: Make incremental stats max serialized size configurable The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation on the maximum serialized size of incremental stats. As a side effect, some users with very large tables are experiencing regressions especially when they upgrade impala and the serialized size goes beyond 200MB. To mitigate the issue, the change introduces a new gflag, 'inc_stats_size_limit_bytes' to make the max serialized size configurable, which allows impala users to specify their own maximum serialized size. Default value for inc_stats_size_limit_bytes is 200MB. The change amends the TBackendConfig class to pass the gflags from backend to the Frontend and the Catalog via thrift. This also revamps existing query options to use the TBackendConfig. Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 --- M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/common/global-flags.cc M be/src/service/fe-support.cc M be/src/service/frontend.cc M be/src/service/frontend.h M common/thrift/Frontend.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/authorization/User.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java 21 files changed, 196 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/4867/6 -- To view, visit http://gerrit.cloudera.org:8080/4867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yonghyun HwangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: Yonghyun Hwang
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Dan Hecht has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/4898/4/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: Line 292: " or see log for previous errors that prevented use of provided directories"); does the log contain the errors, or do we need to log the error status at line 286? Line 336: } else if (status.code() == TErrorCode::SCRATCH_LIMIT_EXCEEDED) { seems like dead code now that the check is handled at the file group level, no? http://gerrit.cloudera.org:8080/#/c/4898/4/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS4, Line 72: allocation would exceed the allocation limit of its associated FileGrou this check doesn't seem to be done by this function anymore. -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#13). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/uda.test A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 47 files changed, 1,003 insertions(+), 762 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/13 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 12: (11 comments) http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: Line 103: DCHECK(false) << "NYI:" << type.DebugString(); > why do we not need these still with the removal of the bail outs for CHAR? The char bailout was still there, but more implicit. Line 462: void CodegenAnyVal::SetFromRawPtr(Value* raw_ptr) { > this looks unused. remove? Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 53: /// TYPE_TIMESTAMP/TimestampVal: { i64, i64 } > decimal? Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1071: // eliminate a branch per value. > what code is this paragraph referring to? should it precede the Init() call It probably makes sense to move it up, since it's implemented by both Init() and the below code. Line 1079: && evaluator->intermediate_type().type != TYPE_TIMESTAMP) { > IsTimestampType() for consistency now that most types have that? Done PS12, Line 1454: : > maybe explain that we hand-generate a code sequence in this case. Done PS12, Line 1670: .type != TYPE_TIMESTAMP > IsTimestampType()? Done PS12, Line 1742: We must use the unlowered type > would be good to briefly explain why rather than just the what. Done Line 1811: "intermediate tuple desc"); > when would this happen? If there's a char field in the tuple. I think this is a step back in terms of error messages, so I added back an explicit check for CHAR fields in the intermediate tuple. I actually noticed there are a couple of places in the codebase that don't check for a null result from this function, so I fixed those. http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS12, Line 630: ; > remove Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: PS12, Line 143: the > "a boolean value represented ..." ? like above comment. Done -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats .. IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats This patch anticipates the case where total_num_values_ can be 0 and makes sure a divide-by-zero is not possible. Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Reviewed-on: http://gerrit.cloudera.org:8080/4975 Reviewed-by: Henry RobinsonTested-by: Internal Jenkins --- M be/src/util/runtime-profile.cc 1 file changed, 3 insertions(+), 1 deletion(-) Approvals: Henry Robinson: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation
Taras Bobrovytsky has posted comments on this change. Change subject: Preview: IMPALA-4363: Add timestamp validation .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 44: > ? Done Line 491: inline bool NeedsValidation() const { > Can you add a brief comment? Done Line 502: /// Verifies that data is acceptable > Can you elaborate what "acceptable" means? Rewrote the comment http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 101: void TimestampValue::Validate() { > Can you move the exception handling into DebugString()? This way it would b I rewrote this so that it does not rely on catching an exception as Alex suggested. PS1, Line 103: > Nit: tab. git-clang-format should fix this. Done -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: Preview: IMPALA-4363: Add timestamp validation .. Preview: IMPALA-4363: Add timestamp validation Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf --- M be/src/exec/parquet-column-readers.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M common/thrift/generate_error_codes.py 4 files changed, 60 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/2 -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError .. IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError In some development environments, the ParserTests may always fail with an java.lang.UnsatisfiedLinkError: org.apache.impala.service.FeSupport.NativeGetStartupOptions()[B at o.a.i.service.FeSupport.NativeGetStartupOptions(Native Method) at o.a.i.service.FeSupport.GetStartupOptions(FeSupport.java:268) at o.a.i.common.RuntimeEnv.(RuntimeEnv.java:47) at o.a.i.common.RuntimeEnv.(RuntimeEnv.java:34) at o.a.i.testutil.TestUtils.assumeKuduIsSupported(TestUtils.java:288) at o.a.i.analysis.ParserTest.TestKuduUpdate(ParserTest.java:1697) I believe the issue is related to some static loading of classes and/or libraries in Java because changing the ParserTest to initialize the Frontend makes the error go away. I haven't been able to pin-point the exact issue with loading, but it makes sense that the ParserTest should initialize the Frontend static state if it will be called by libfesupport later since it seems to be an issue affecting some environments and not others, i.e. subject to environmental factors. This fixes the issue by changing ParserTest to extend FrontendTestBase which initializes the Frontend class statically. Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87 Reviewed-on: http://gerrit.cloudera.org:8080/4976 Reviewed-by: Alex BehmTested-by: Internal Jenkins --- M fe/src/test/java/org/apache/impala/analysis/ParserTest.java 1 file changed, 2 insertions(+), 20 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-3552: Make incremental stats max serialized size configurable .. Patch Set 5: (15 comments) I've taken over this. Please check the next PS. http://gerrit.cloudera.org:8080/#/c/4867/5//COMMIT_MSG Commit Message: PS5, Line 21: set of : gflags, to frontend, > nit: replace with "to the" Done http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: Line 150: const { > move into line above (90 chars per line max) Done Line 152: cfg.load_catalog_in_background = FLAGS_load_catalog_in_background; > use the thrift setter functions Done http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/catalog/catalog.h File be/src/catalog/catalog.h: Line 104: Status GetCatalogConfigBytes(JNIEnv* jni_env, jbyteArray* cfg_bytes) const; > Document a line or two on what this does. Done http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/service/frontend.h File be/src/service/frontend.h: Line 176: Status GetFrontendConfigBytes(JNIEnv* jni_env, jbyteArray* cfg_bytes) const; > Document. Done http://gerrit.cloudera.org:8080/#/c/4867/5/common/thrift/Types.thrift File common/thrift/Types.thrift: Line 236: 1: required string sentry_config > Why are some of these required and some optional? I couldn't make out a pat Made everything optional. Line 242: 3: required i32 other_log_lvl > non_impala_java_vlog Done Line 246: 5: required i64 inc_stats_size_limit_bytes > your gflag is a uint64, I suggest making the gflag signed as well Done Line 249: 6: required bool compute_lineage > if possible, I'd prefer to pass the gflags directly, i.e., instead of a boo Isn't it better if we let the backend drive the decision here and just pass the flags the fe? I'm fine either way but if you feel strongly about this, I'll make the change. Please let me know. http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1601: (statsSizeEstimate < BackendConfig.INSTANCE.getIncStatMaxSize()); > getIncStatsMaxSize() (Stats vs. Stat) Done http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java: Line 30: public class RuntimeEnv { > much better! :) http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/BackendConfig.java File fe/src/main/java/org/apache/impala/service/BackendConfig.java: Line 41: } > How about adding initializing the singleton this way: Done Line 43: public void setBackendConfig(TBackendConfig cfg) { > indentation Done http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: Line 83: public JniCatalog(byte[] thriftBEConfig) throws InternalException, > thriftBackendConfig or thriftBeConfig Done http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: Line 120: public JniFrontend(byte[] thriftBEConfig) throws InternalException, > remove InternalException because that's already covered by ImpalaException Done -- To view, visit http://gerrit.cloudera.org:8080/4867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yonghyun HwangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: Yonghyun Hwang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > How can we separate them though? The new API returns a UUID. oops, i didn't see that. in that case, shipping those around would cause an immediate size regression. so we need to translate them into ordinals. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo GaalGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3126: Conservative assignment of inner-join On-clause predicates.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4982 Change subject: IMPALA-3126: Conservative assignment of inner-join On-clause predicates. .. IMPALA-3126: Conservative assignment of inner-join On-clause predicates. Implements the following conservative but correct policy for assigning predicates from the On-clause of an inner join: If the predicate references an outer-joined tuple, then evaluate it at the inner join that the On-clause belongs to. Cleans up Analyzer.canEvalPredicate(). Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 2 files changed, 143 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/4982/1 -- To view, visit http://gerrit.cloudera.org:8080/4982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4266: Java udf returning string can give incorrect results .. Patch Set 5: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/4941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3872: allow providing PyPi mirror for python packages .. Patch Set 6: Code-Review+2 carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3872: allow providing PyPi mirror for python packages .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4770/4//COMMIT_MSG Commit Message: Line 11: the mirrors). > Commit should perhaps say how the mirror is provided to the script. Done -- To view, visit http://gerrit.cloudera.org:8080/4770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages
Hello Michael Brown, Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4770 to look at the new patch set (#5). Change subject: IMPALA-3872: allow providing PyPi mirror for python packages .. IMPALA-3872: allow providing PyPi mirror for python packages We still rely on the python.org json API, which doesn't seem to be mirrored (instead there's a html-based index format implemented by the mirrors). The mirror can be provided by setting the PYPI_MIRROR environment variable. The default is "https://pypi.python.org;. Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c --- M infra/python/deps/pip_download.py 1 file changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/4770/5 -- To view, visit http://gerrit.cloudera.org:8080/4770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages
Henry Robinson has posted comments on this change. Change subject: IMPALA-3872: allow providing PyPi mirror for python packages .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4770/4//COMMIT_MSG Commit Message: Line 11: the mirrors). Commit should perhaps say how the mirror is provided to the script. -- To view, visit http://gerrit.cloudera.org:8080/4770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster
David Knupp has posted comments on this change. Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/4769/13//COMMIT_MSG Commit Message: Line 51: still problems to work out with the remote data load script itself. > Did you try loading data on a remote cluster and running tests on in with t Yes, many times. I should update this sentence to be more clear. This is mainly a references to the several "clean up" changes that Harrison suggested earlier, for which JIRA's have been opened. We can address those when there's time. More pressing than cleaning up all those things is fact that we need to have this checked in order to validate Impala running against a remote CDH 5.10 cluster, and time is getting short. We have less than two weeks now. There were some other actual problems that were mysterious to me initially. E.g., Kudu related failures started appearing once recent Kudu changes were submitted -- until I realized that this issue was breaking things: https://jira.cloudera.com/browse/OPSAPS-37322 But after tweaking the cluster, data loading works, and tests run -- though many tests may need to be tweaked to work remotely. http://gerrit.cloudera.org:8080/#/c/4769/13/bin/remote_data_load.py File bin/remote_data_load.py: Line 534: sys.exit(1) > In general, I think it's a bad practice to call sys.exit inside functions. OK, I'll move this. I'd seen this pattern used here in other scripts here (e.g., load-data.py that we use for local data loading), so didn't know it was a frowned upon practice. http://gerrit.cloudera.org:8080/#/c/4769/13/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: PS13, Line 120: Wide tables fail due to the SERDEPROPERTIES limits > is this a new issue? Is it specific to remote data loading? For our mini-cluster, we work around this problem here: https://github.com/apache/incubator-impala/blob/master/bin/create-test-configuration.sh#L99 However, create-test-configuration.sh is part of our local build process. It doesn't get called when CDH is deployed to a remote cluster. Besides, that script assumes that the metastore database will always be postgres, which is not the case when testing against a remote cluster. Before the change to this file, I had been using another hand-rolled script to configure the property separately after deployment. With this, I can drop that step. I've also tested the local data load after this change, and it's unaffected. -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. Patch Set 10: Hit IMPALA-3040 -- To view, visit http://gerrit.cloudera.org:8080/4838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4437: fix crash in disk-io-mgr
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4979 Change subject: IMPALA-4437: fix crash in disk-io-mgr .. IMPALA-4437: fix crash in disk-io-mgr This fixes another issue where the 'buffer_' field was not set to NULL on an error, triggering a DCHECK. Testing: Added a unit test that triggers the bug on the two different codepaths that I fixed. Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00 --- M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc 2 files changed, 40 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4979/1 -- To view, visit http://gerrit.cloudera.org:8080/4979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.
Internal Jenkins has submitted this change and it was merged. Change subject: Minor fixes to remove more "cloudera"s from the code. .. Minor fixes to remove more "cloudera"s from the code. Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a Reviewed-on: http://gerrit.cloudera.org:8080/4774 Reviewed-by: Jim AppleTested-by: Internal Jenkins --- M tests/benchmark/report_benchmark_results.py M tests/test-hive-udfs/src/main/java/org/apache/impala/UnresolvedUdf.java 2 files changed, 15 insertions(+), 12 deletions(-) Approvals: Jim Apple: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4260: Alter table add column drops all the column stats
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4260: Alter table add column drops all the column stats .. IMPALA-4260: Alter table add column drops all the column stats Hive expects types for column stats to be specified as all lower case. For some reason, it doesn't check this when the stats are first written, but it does check when performing an 'alter table'. This causes it to drop stats that Impala wrote because we specify type names in upper case. This patch converts the types that Impala sends to Hive for the column stats to all lower case and adds a regression test. I also filed HIVE-15061 to track the issue from the Hive end. Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda Reviewed-on: http://gerrit.cloudera.org:8080/4845 Reviewed-by: Matthew JacobsTested-by: Internal Jenkins --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test 2 files changed, 83 insertions(+), 1 deletion(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4260: Alter table add column drops all the column stats
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4260: Alter table add column drops all the column stats .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4969 Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles .. IMPALA-4392: restore PeakMemoryUsage to DataSink profiles The join build sink patches refactored the DataSink interface and inadvertently removed this counter from the profile. The problem was that the sink MemTracker was not initialized with the sink's profile. The fix is to replumb things so that the profile is created in the constructor and can be used when constructing the MemTracker. Testing: Ran core tests. Manually checked profile to make sure the counter appeared in HdfsTableSink, DataStreamSender, etc. Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 --- M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/plan-fragment-executor.cc 19 files changed, 85 insertions(+), 98 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4969/1 -- To view, visit http://gerrit.cloudera.org:8080/4969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Alex Behm has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 11: Code-Review+1 Assuming the S3 run passes. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default .. Patch Set 4: Updated patch to resolve merge conflicts with: 1) UPSERT patch 2) Support for non-covering range partitioning -- To view, visit http://gerrit.cloudera.org:8080/4911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default
Hello Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4911 to look at the new patch set (#4). Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default .. IMPALA-3710: Kudu DML should ignore conflicts by default Removes the non-standard IGNORE syntax that was allowed for DML into Kudu tables to indicate that certain errors should be ignored, i.e. not fail the query and continue. However, because there is no way to 'roll back' mutations that occurred before an error occurs, tables are left in an inconsistent state and it's difficult to know what rows were successfully modified vs which rows were not. Instead, this change makes it so that we always 'ignore' these conflicts, i.e. a 'best effort'. In the future, when Kudu will provide the mechanisms Impala needs to provide a notion of isolation levels, then Impala will be able to provide options for more traditional semantics. After this change, the following errors are ignored: * INSERT where the PK already exists * UPDATE/DELETE where the PK doesn't exist Another follow-up patch will change other violations to be handled in this way as well, e.g. nulls inserted in non-nullable cols. Reporting: The number of rows inserted is reported to the coordinator, which makes the aggregate available to the shell and via the profile. TODO: Return rows modified for INSERT via HS2 (IMPALA-1789). TODO: Return rows modified for other CRUD (beeswax+hs2) (IMPALA-3713). TODO: Return error counts for specific warnings (IMPALA-4416). Testing: Updated tests. Ran all functional tests. More tests will be needed when other conflicts are handled in the same way. Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M common/thrift/DataSinks.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 17 files changed, 66 insertions(+), 200 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4911/4 -- To view, visit http://gerrit.cloudera.org:8080/4911 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Lars Volker has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 9: (3 comments) Thanks for the reviews. I'm running another private job to test the change on S3 and will update the Jira once it's done. http://gerrit.cloudera.org:8080/#/c/4863/9/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 488: import os > why? Sry, leftover from trying to get the tests to work. Line 491: table_location = DEFAULT_FS + "/" + table_path > I'm confused. Doesn't get_fs_path() do the right thing? Why do we have to a It does. I couldn't figure out why FILESYSTEM_PREFIX works the way it does on the Jenkins host I ran this on, but not on my local dev machine. Sailesh helped me, now I'm convinced it should work. I'm running another private build and will update this change again once I have results. Line 550: table, table_location) > indent 4 Done -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4048: Misc. improvements to /sessions
Henry Robinson has submitted this change and it was merged. Change subject: IMPALA-4048: Misc. improvements to /sessions .. IMPALA-4048: Misc. improvements to /sessions * Make table searchable and sortable * Fix 'last accessed time' being printed in UTC * Made table contents more compact so that it mostly fits on screen * Clarify summary text re: active and inactive sessions * Include fix in mustache-cpp required to print 64-bit integers correctly (see https://github.com/henryr/cpp-mustache/commit/29768bf0e84f5a1e95e006fc64996d375499dbda) Testing: Visual inspection and manual sorting, searching etc. Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Reviewed-on: http://gerrit.cloudera.org:8080/4880 Reviewed-by: Thomas Tauber-MarshallTested-by: Internal Jenkins Reviewed-by: Tim Armstrong --- M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/thirdparty/mustache/mustache.cc M www/sessions.tmpl 6 files changed, 78 insertions(+), 46 deletions(-) Approvals: Internal Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4880 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4863 to look at the new patch set (#11). Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert.py M tests/query_test/test_insert_behaviour.py M tests/util/filesystem_utils.py 14 files changed, 344 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/11 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats
Henry Robinson has posted comments on this change. Change subject: IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/4769/13//COMMIT_MSG Commit Message: Line 51: still problems to work out with the remote data load script itself. Did you try loading data on a remote cluster and running tests on in with the scripts in this patch? http://gerrit.cloudera.org:8080/#/c/4769/13/bin/remote_data_load.py File bin/remote_data_load.py: Line 534: sys.exit(1) In general, I think it's a bad practice to call sys.exit inside functions. It's better to raise an exception and if it does not get caught on the outside, then exit will be triggered. http://gerrit.cloudera.org:8080/#/c/4769/13/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: PS13, Line 120: Wide tables fail due to the SERDEPROPERTIES limits is this a new issue? Is it specific to remote data loading? I looked at HIVE-1364 it says it got fixed a few years ago. -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Lars Volker has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 3: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/4898/3/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 508: /// is nit: line wrapping http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: PS1, Line 296: x_ = ran > Good point, users have been confused by this before when the mixed terminol I agree that replacing Tmp with Scratch might not be worth it. Having those two works for me. Line 336: } else if (status.code() == TErrorCode::SCRATCH_LIMIT_EXCEEDED) { > I prefer it this way since it's more obvious that it's an "successful" retu Good point. http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS1, Line 136: unique_i > I don't think it will ever be anything asides from the query id, but maybe Thanks. When I first saw query_id I assumed there might be some side effects that require query id semantics, so unique_id seems clearer to me. -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation
Alex Behm has posted comments on this change. Change subject: Preview: IMPALA-4363: Add timestamp validation .. Patch Set 1: Overall structure looks good ok to me, minus Lars' comments. Ideally, we'd have a cheaper way to validate other than doing DebugString() with a try-catch. It should be possible for us to determine the acceptable integer value range and check that. -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 3: PS3 is a rebase -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. IMPALA-3202: refactor scratch file management into TmpFileMgr This is a pure refactoring patch that moves all of the logic for allocating scratch file ranges into TmpFileMgr in anticipation of this logic being used by the new BufferPool. There should be no behavioural changes. Also remove a bunch of TODOs that we're not going to fix. Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 6 files changed, 439 insertions(+), 432 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4898/3 -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
Alex Behm has posted comments on this change. Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4863 to look at the new patch set (#10). Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. IMPALA-2523: Make HdfsTableSink aware of clustered input IMPALA-2521 introduced clustering for insert statements. This change makes the HdfsTableSink aware of clustered inputs, so that partitions are opened, written, and closed one by one. This change also adds/modifies tests in several ways: - clustered insert tests switch from selecting all rows from alltypessmall to alltypes. Together with varying settings for batch_size, this results in a larger number of row batches being written. - clustered insert tests select from alltypes instead of functional.alltypes to make sure we also select from various input formats. - clustered insert tests have been added to select from alltypestiny to create inserts with 1 and 2 rows per partition respectively. - exhaustive insert tests now use different values for batch_size: 1, 16, 0 (meaning default, 1024). This is limited to uncompressed parquet files, to maintain a reasonable runtime. On my machine execution of test.insert took 1778 seconds, compared to 1002 seconds with the just default row batch size. - There is additional testing in test_insert_behaviour.py to make sure that insertion over several row batches only creates one file per partition. - It renames the test_insert method to make it unique in the file and allow for effective filtering with -k. Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert.py M tests/query_test/test_insert_behaviour.py M tests/util/filesystem_utils.py 14 files changed, 344 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/10 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. IMPALA-3202: refactor scratch file management into TmpFileMgr This is a pure refactoring patch that moves all of the logic for allocating scratch file ranges into TmpFileMgr in anticipation of this logic being used by the new BufferPool. There should be no behavioural changes. Also remove a bunch of TODOs that we're not going to fix. Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 6 files changed, 441 insertions(+), 434 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4898/2 -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/buffered-block-mgr-test.cc File be/src/runtime/buffered-block-mgr-test.cc: PS1, Line 308: EXPECT_TRUE > ASSERT_TRUE? Otherwise we will hit a NPE in DeleteBlocks(). Makes sense. This pattern of always using EXPECT is unfortunately everywhere in the unit tests. Mostly I think ASSERT makes the most sense. I just did a search-and-replace on this file to change it everywhere in this file where I could (it's only allowed in functions that return void). http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 246: ->Init(state->io_mgr(), tmp_file_mgr, profile, parent, mem_limit, scratch_limit); > This line break looks odd. If it was done by clang-format I'd keep it, but I agree that it looks weird. clang-format did it. http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: Line 282: DCHECK_EQ(tmp_files_.size(), 0); > nit: DCHECK(tmp_files_.empty()); Done PS1, Line 290: by ignoring the return status of : // NewFile(). > we don't seem to actually ignore it, the comment looks wrong. True, made the comment more accurate. PS1, Line 296: spilling > "No scratch directories..."? We seem to use 'tmp', 'scratch', and 'spilling Good point, users have been confused by this before when the mixed terminology leaked into error messages: IMPALA-3866. We need to keep scratch as the user-facing term to match the command-line option, so I fixed the "spilling" term here. We could rename TmpFileMgr, etc to ScratchFileMgr, I'm unsure if it's worth the code churn though - what do you think? Line 335: scratch_space_bytes_used_counter_->Add(num_bytes); > can we handle current_bytes_allocated_ here, too? That's also the only dire That works out cleaner, thanks. Line 336: return Status::OK(); > nit: this could just be "return status". I prefer it this way since it's more obvious that it's an "successful" return path. I don't feel that strongly but I find this easier to parse. Line 351: err_status.MergeStatus(errs[i]); > nit: single line Done http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS1, Line 133: > nit: double space Done PS1, Line 136: query_id > does this actually have to be the query_id or could we rename it to somethi I don't think it will ever be anything asides from the query id, but maybe "unique_id" makes more sense. -- To view, visit http://gerrit.cloudera.org:8080/4898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats
Henry Robinson has posted comments on this change. Change subject: IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4975/1/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS1, Line 1034: DCHECK(total_num_values_ > 0); I don't quite see the reason for this dcheck. If total_num_values_ becomes < 0, we should catch that at the point it happens (rather than here). So in SetStats(), I think you should check for total_num_values_ >= 0, and set it to 0 - or just drop the update completely - if it's not. (Rather than DCHECK - because here we're dealing with input from an RPC I think, and it's better to be robust where possible in those situations). -- To view, visit http://gerrit.cloudera.org:8080/4975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1539: // TODO: Add option to disable expr rewrites for this test. This disjunction gets > Agree. I think we really want a query option to disable these kind of "opti How about we add a query test that tests || on a column with some NULLs in it so we're exercising the runtime code path? I don't really like the fact that the rewrite changes the type of the expression but it seems mostly harmless - how hard is it to retain the original type of the expression though? -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 1539: // TODO: Add option to disable expr rewrites for this test. This disjunction gets > We're losing real test coverage here. Can you rewrite the test in a way tha Agree. I think we really want a query option to disable these kind of "optimizing" rewrites. Adding a test that circumvents the rewrite is going to be hard, i.e. we won't be able to go through SQL at all. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Henry Robinson has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 4: (9 comments) Looks pretty close to me. http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: PS4, Line 105: if (numClients > 0) { : metaStoreClientPool_.addClients(1, initialCnxnTimeoutSec); : metaStoreClientPool_.addClients(numClients - 1, 0); : } I think you can make this logic a method of MetaStoreClientPool - it's already called from MSCP's constructor, so a chance for sharing one method between both palces. http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 110: private static final int META_STORE_CLIENT_POOL_SIZE = 10; is this the max size, or the initial size, or both? http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java File fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java: PS4, Line 76: private Add a comment to this method describing the behaviour. PS4, Line 83: IMetaStoreClient hiveClient = null; What's the point of this variable - why not assign to hiveClient_ directly? PS4, Line 99: catch (InterruptedException ignore) {} on this path you might end up not sleeping for long enough if you get interrupted. Instead, suggest you have something roughly like: long delayUntil = System.currentTimeMillis() + retryDelayMillis; if (delayUntil > endTimeMillis) throw... while (delayUntil > System.currentTimeMillis()) { try { Thread.sleep(delayUntil - System.currentTimeMillis()); } catch (...) { } } PS4, Line 161: initialCnxnTimeoutSec this is not the 'initial' timeout in the same sense that it is elsewhere, I think. Elsewhere we use it to mean "first client", but here it's used to mean "first connection". I think it's better just to call it cnxnTimeoutSec here. http://gerrit.cloudera.org:8080/#/c/4842/4/tests/experiments/test_catalog_hms_failures.py File tests/experiments/test_catalog_hms_failures.py: PS4, Line 81: 10 just from experience, suggest you make this 30s. Timeouts are surprisingly flaky in EC2-based build infrastructure. Line 114: Perhaps wait 5s or so here to be sure that the catalog is in the 'trying to connect' phase of its startup. Line 122 How about a test that confirms that if the HMS does not start within initial_hms_cnxn_timeout_s, then the catalogd fails? -- To view, visit http://gerrit.cloudera.org:8080/4842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83c70f939429e1d0d20284a1307f3ee1278ae047 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Alex Behm has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/4863/9/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 488: import os why? Line 491: table_location = DEFAULT_FS + "/" + table_path I'm confused. Doesn't get_fs_path() do the right thing? Why do we have to add this DEFAULT_FS prefix? Line 550: table, table_location) indent 4 -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.
Alex Behm has posted comments on this change. Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join. .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS1, Line 1426: hasOjTest > Where is this function used? Sorry, this was an experiment. Not needed. Removed. PS1, Line 1558: evalByJoin > This variable name is a bit confusing to me. Is "evalAfterJoin" better? Done http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: PS1, Line 851: public static void removeDuplicates(List l) > Suggest to refactor this function a bit like below (if possible): Exprs are not hashable, i.e. do not implement hashCode(). Fixing that is beyond the scope of this change and also dangerous (wrong results, if not done properly). I'd rather not change the API of this function, because that will force changes in many parts of the code which could introduce new bugs and make backporting more difficult. We don't really need thread safety here, so I don't think we need to go out of out way to add it. PS1, Line 853: List origList = Lists.newArrayList(l); : l.clear(); : for (C expr: origList) if (!l.contains(expr)) l.add(expr); > Would it be better to use a Set instead of List? See comment above. Not hashable. Also a Set is not good because it does not preserve insertion order and may lead to tests with non-deterministic results. -- To view, visit http://gerrit.cloudera.org:8080/4960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #27 Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.
Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join. .. IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join. Background: We generally allow the assignment of predicates below the nullable side of a left/right outer join, explained as follows using an example: SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id WHERE t2.int_col < 10 The scan of 't2' will pick up 't2.int_col < 10' via Analyzer.getBoundPredicates() and recognizes that the predicate must also be evaluated by a join later, so the predicate is not marked as assigned. The join then picks up the unassigned predicate via Analyzer.getUnassignedConjuncts(). The bug was that our logic for detecting whether a bound predicate must also be evaluated at a join node was flawed because it only considered whether the tuples of the source or destination predicate were outer joined (plus other conditions). The underlying assumption is that either the source or destination tuple are bound by a tuple produced by a TableRef, but in the buggy query the source predicate is bound by an aggregation tuple, so we incorrectly marked the bound predicate as assigned in Analyzer.getBoundPredicates(). The fix is to conservatively not mark bound predicates as assigned if there exists equivalent tuples that are outer joined. This conservative fix leads to some duplicate assignments of predicates. Those are simply deduped now. Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 6 files changed, 48 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/4960/2 -- To view, visit http://gerrit.cloudera.org:8080/4960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #27
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > in this patch, let's focus on using the new api, rather than shrinking the How can we separate them though? The new API returns a UUID. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo GaalGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 9: The reason the GVO failed was the change in this patch that always calls QuerySchedule::set_is_admitted(true) for query schedules that don't go through AdmissionController::AdmitQuery. This was causing AdmissionController::ReleaseQuery to try to release queries that had never been subject to admission control, resulting in a crash. Originally, this was fine because the only condition where AdmitQuery wouldn't be called was when FLAG_disable_admission_control was false, which would also prevent ReleaseQuery from being called. However, a recent change added 'is_mt_execution' as a condition for skipping AdmitQuery but not as a condition for skipping ReleaseQuery, leading to the crash. Also originally, setting is_admitted to true when FLAG_disable_admission_control to true was necessary as we were relying on QuerySchedule::is_admitted to differentiate queries that were not scheduled/queued/etc, even when admission control is off. With the change to just using the query event log for differentiating not scheduled/queued/etc. states, this is no longer necessary, so to fix the crash I just removed it. -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.
Jim Apple has posted comments on this change. Change subject: Minor fixes to remove more "cloudera"s from the code. .. Patch Set 2: Code-Review+2 Carry Tim's +2 -- To view, visit http://gerrit.cloudera.org:8080/4774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.
Tim Armstrong has posted comments on this change. Change subject: Minor fixes to remove more "cloudera"s from the code. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. Patch Set 10: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/4838/8/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1030: // significant bits overlap in memory. > I guess these should really have the LITTLE_ENDIAN checks too, right? Reworded the comment to be less little-endian specific. I think if we were to port this to big-endian, we'd add padding to DecimalVal so that the least significant bits lined up (which means the field offsets would be different). I don't think this code would have to change. -- To view, visit http://gerrit.cloudera.org:8080/4838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: I was going to poll a few people but this got bumped down my priority list a bit - will come back to it probably in a week or so. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Jim Apple has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: > (1 comment) Any updates no this? -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Dan Hecht has posted comments on this change. Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4838/8/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1030: // of the val16 should be initialized to all 0's. I guess these should really have the LITTLE_ENDIAN checks too, right? -- To view, visit http://gerrit.cloudera.org:8080/4838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. Patch Set 8: Ran into a bug in the patch where histogram was interpreting a val4 or val8 as a val16, and seeing uninitialised bits. This seems to be an expected way to use DecimalVal in the UDF interface. The fix is to memset() the whole value to 0 when allocating it. Documented this behaviour a bit better. -- To view, visit http://gerrit.cloudera.org:8080/4838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes
Hello Internal Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4838 to look at the new patch set (#7). Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes .. IMPALA-4302,IMPALA-2379: constant expr arg fixes This patch fixes two issues around handling of constant expr args. The patches are combined because they touch some of the same code and depend on some of the same memory management cleanup. First, it fixes IMPALA-2379, where constant expr args were not visible to UDAFs. The issue is that the input exprs need to be opened before calling the UDAF Init() function. Second, it avoids overhead from repeated evaluation of constant arguments for ScalarFnCall expressions on both the codegen'd and interpreted paths. A common example is an IN predicate with a long list of constant values. The interpreted path was inefficient because it always evaluated all children expressions. Instead in this patch constant args are evaluated once and cached. The memory management of the AnyVal* objects was somewhat nebulous - adjusted it so that they're allocated from ExprContext::mem_pool_, which has the correct lifetime. The codegen'd path was inefficient only with varargs - with fixed arguments the LLVM optimiser is able to infer after inlining that the expressions are constant and remove all evaluation. However, for varargs it stores the vararg values into a heap-allocated buffer. The LLVM optimiser is unable to remove these stores because they have a side-effect that is visible to code outside the function. The codegen'd path is improved by evaluating varargs into an automatic buffer that can be optimised out. We also make a small related change to bake the string constants into the codegen'd code. Testing: Ran exhaustive build. Added regression test for IMPALA-2379 and MemPool test for aligned allocation. Added a test for in predicates with constant strings. Perf: Added a targeted query that demonstrates the improvement. Also manually validated the non-codegend perf. Also ran TPC-H and targeted perf queries locally - didn't see any significant changes. ++---+---++-++---++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | ++---+---++-++---++-+---+ | TARGETED-PERF(_20) | primitive_filter_in_predicate | parquet / none / none | 1.19 | 9.82| I -87.85% | 3.82% | 0.71%| 1 | 10| ++---+---++-++---++-+---+ (I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / none / none] (9.82s -> 1.19s [-87.85%]) +--++--+--++---+--+--++++---+ | Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows | +--++--+--++---+--+--++++---+ | 01:AGGREGATE | 14.39% | 155.88ms | 214.61ms | -27.37%| 2.68% | 163.38ms | 227.53ms | -28.19%| 1 | 1 | 1 | | 00:SCAN HDFS | 85.60% | 927.46ms | 9.43s| -90.16%| 4.49% | 1.01s| 9.50s| -89.42%| 1 | 13.77K | 14.05K| +--++--+--++---+--+--++++---+ Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24 --- M be/src/benchmarks/in-predicate-benchmark.cc M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/anyval-util.cc M be/src/exprs/anyval-util.h M be/src/exprs/case-expr.cc M be/src/exprs/expr-context.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/in-predicate.h M be/src/exprs/literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/utility-functions-ir.cc M be/src/runtime/free-pool-test.cc M be/src/runtime/free-pool.h M be/src/runtime/mem-pool-test.cc M be/src/runtime/mem-pool.cc M be/src/runtime/mem-pool.h M be/src/testutil/test-udas.cc M be/src/udf/udf-internal.h M be/src/udf/udf-test-harness.cc M be/src/udf/udf.cc M
[Impala-ASF-CR](asf-site) Initial commit of the blog section of the Impala ASF website.
Jim Apple has posted comments on this change. Change subject: Initial commit of the blog section of the Impala ASF website. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4944/1/pelicanconf.py File pelicanconf.py: Line 1: #!/usr/bin/env python > I wasn't sure what to do about this. I noticed that while some of our third The licenses in the master branch are spelled out in https://github.com/apache/incubator-impala/blob/master/LICENSE.txt I suspect AGPL would prevent us form using Pelican unless we author all of the templates ourselves: https://www.apache.org/legal/resolved We had to take to comments out of our Doxygen file, for instance, because they were included in Doxygen source files and not given an Apache-compatible license. Does Pelican license the templates differently? -- To view, visit http://gerrit.cloudera.org:8080/4944 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa578a70237fcc97589c667c17a70d3d6dad5ae1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.
Anonymous Coward #27 has posted comments on this change. Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS1, Line 1426: hasOjTest Where is this function used? http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: PS1, Line 851: public static void removeDuplicates(List l) Suggest to refactor this function a bit like below (if possible): public static List removeDuplicates(List l) { if (l == null) { return l; } else { return Lists.newArrayList(Sets.newHashSet(l)); } } This makes this function thread-safe. The passed in list won't be modified, and thus can be safely shared among threads. PS1, Line 853: List origList = Lists.newArrayList(l); : l.clear(); : for (C expr: origList) if (!l.contains(expr)) l.add(expr); Would it be better to use a Set instead of List? Set s = Sets.newHashSet(l); l.clear(); l.addAll(s); -- To view, visit http://gerrit.cloudera.org:8080/4960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Anonymous Coward #27 Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > Agree, that's even better. We need to group the UUIDs by host and then assi in this patch, let's focus on using the new api, rather than shrinking the disk ids down to their minimum size. that's probably best handled in a follow-on. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo GaalGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4437: hit DCHECK in buffered-block-mgr-test
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4437: hit DCHECK in buffered-block-mgr-test .. IMPALA-4437: hit DCHECK in buffered-block-mgr-test The issue was introduced by "IMPALA-3202: DiskIoMgr improvements for new buffer pool". The bug is pretty straightforward - I forgot to update some error/cancellation paths to handle client-provided buffers. Testing: Was able to reproduce locally before the fix, have run it in a loop for a while with the fix and not seen any crashes. Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545 Reviewed-on: http://gerrit.cloudera.org:8080/4973 Reviewed-by: Henry RobinsonTested-by: Internal Jenkins --- M be/src/runtime/disk-io-mgr.cc 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Henry Robinson: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4437: hit DCHECK in buffered-block-mgr-test
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4437: hit DCHECK in buffered-block-mgr-test .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No