[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/355/ -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has submitted this change and it was merged. Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() This change implements support for TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue(). TimestampValue itself is 16 bytes in size. To match RawValue::Write() in the interpreted path, CodegenAssignNullValue() emits code to assign HashUtil::FNV_SEED to both the upper and lower 64-bit of the destination value. This change also fixes the handling of 128-bit Decimal16Value in CodegenAssignNullValue() so the emitted code matches the behavior of the interpreted path. Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Reviewed-on: http://gerrit.cloudera.org:8080/4794 Reviewed-by: Michael Ho Tested-by: Michael Ho --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hash-table.cc M be/src/exec/old-hash-table.cc M be/src/util/bit-util.h M testdata/workloads/functional-query/queries/QueryTest/joins.test 7 files changed, 72 insertions(+), 37 deletions(-) Approvals: Michael Ho: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. IMPALA-3211: provide toolchain build id for bootstrapping Testing: Ran a private build, which succeeded. Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Reviewed-on: http://gerrit.cloudera.org:8080/4771 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 14 insertions(+), 2 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. IMPALA-4339: ensure coredumps end up in IMPALA_HOME Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Reviewed-on: http://gerrit.cloudera.org:8080/4785 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M buildall.sh 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Lars Volker has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: > (1 comment) > > The change makes sense and we should get it in regardless, but > could this have caused the original crash we saw? I don't think this caused what we saw. I removed the DCHECK to see what would happen, since the original stack was from a Release build, but ran into another DCHECk. After removing that one (and a third one), too, the queries just started to "work", reporting a different warning about not being able to read from the file. The same also happened with a Release build, which I was unable to crash. -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has posted comments on this change. Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has posted comments on this change. Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. Patch Set 3: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/344/ -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() This change implements support for TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue(). TimestampValue itself is 16 bytes in size. To match RawValue::Write() in the interpreted path, CodegenAssignNullValue() emits code to assign HashUtil::FNV_SEED to both the upper and lower 64-bit of the destination value. This change also fixes the handling of 128-bit Decimal16Value in CodegenAssignNullValue() so the emitted code matches the behavior of the interpreted path. Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hash-table.cc M be/src/exec/old-hash-table.cc M be/src/util/bit-util.h M testdata/workloads/functional-query/queries/QueryTest/joins.test 7 files changed, 72 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4794/3 -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Internal Jenkins has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4309: Introduce Expr rewrite phase and supporting classes.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4309: Introduce Expr rewrite phase and supporting classes. .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/4746/6/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: Line 117: // canEvalUsingPartitionMd() to fold constant expressions in-place. comment why the cloning is necessary. http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: Line 38: if (!(expr instanceof BetweenPredicate)) return expr; > 1. Sure, the driver can keep track of changes instead, but it seems like we i think your example illustrates my point about making multiple passes. a remove-redundant-conjuncts rule would look for a compound predicate and return a new compound predicate. a normalization rule would look for nested compound predicates and unnest them. (i agree that a rule needs to be able to look into the child exprs, but that can easily be done with tree predicates.) you'd want to apply the normalization rule repeatedly, until the tree stops changing, and then the redundancy removal rule. http://gerrit.cloudera.org:8080/#/c/4746/6/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java: Line 40: public Expr rewrite(Expr expr, Analyzer analyzer) throws AnalysisException { i would have expected this to apply a single rule to the entire expr tree until there are no more changes, then switch to the next rule, etc. i guess it doesn't matter at the moment, since both approaches are equivalent for the between rule. probably best to see some more rules before tweaking the traversal patterns. Line 52: Expr origExpr = rewrittenExpr.clone(); why is this necessary? if the convention is that apply() always returns the original expr if no transformation was done, then a pointer comparison should be sufficient. -- To view, visit http://gerrit.cloudera.org:8080/4746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs Tested-by: Internal Jenkins --- M be/src/exec/data-sink.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/runtime/coordinator.cc M be/src/service/impala-beeswax-server.cc M be/src/service/query-exec-state.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M shell/impala_client.py M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test M tests/beeswax/impala_beeswax.py 14 files changed, 248 insertions(+), 97 deletions(-) Approvals: Matthew Jacobs: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] Add distcc infrastructure.
Internal Jenkins has submitted this change and it was merged. Change subject: Add distcc infrastructure. .. Add distcc infrastructure. This has been working for several months, and it it was written mainly by Casey Ching while he was at Cloudera working on Impala. Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Reviewed-on: http://gerrit.cloudera.org:8080/4820 Reviewed-by: Jim Apple Tested-by: Internal Jenkins --- M .gitignore A bin/distcc/.gitignore A bin/distcc/README.md A bin/distcc/distcc.sh A bin/distcc/distcc_env.sh 5 files changed, 330 insertions(+), 1 deletion(-) Approvals: Jim Apple: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add distcc infrastructure.
Internal Jenkins has posted comments on this change. Change subject: Add distcc infrastructure. .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: (1 comment) The change makes sense and we should get it in regardless, but could this have caused the original crash we saw? http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 445 This is definitely suspicious. Sometimes I feel like some of the DCHECKS were expressions of hope rather than assertions. -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. Patch Set 4: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4350: Crash with vlog level 2 in hash join node
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4830 Change subject: IMPALA-4350: Crash with vlog level 2 in hash join node .. IMPALA-4350: Crash with vlog level 2 in hash join node Testing: Reproduced by running test_mem_usage_scaling against an Impala with -v=2. Confirmed the fix avoided the crash. We don't have any routine testing of high log levels so there isn't a clear way to get good coverage of all code paths where there might be similar bugs. Change-Id: Ieedd49d8a17709177a622ddc15b78a3f48e12d3f --- M be/src/exec/partitioned-hash-join-node.cc 1 file changed, 13 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/4830/1 -- To view, visit http://gerrit.cloudera.org:8080/4830 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ieedd49d8a17709177a622ddc15b78a3f48e12d3f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4362: Misc. fixes for PFE counters
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4829 Change subject: IMPALA-4362: Misc. fixes for PFE counters .. IMPALA-4362: Misc. fixes for PFE counters * ExecTime was always 0, because it wasn't updated before the profile was reported to the coordinator * Made ExecTree* counters children of their respective parents * Moved ExecTreePrepareTime into the right profile * Renamed timings profile to something a bit more obvious. Change-Id: Iaa9dccdcd91297a11a093b921f8d84d32f32be84 --- M be/src/runtime/plan-fragment-executor.cc 1 file changed, 23 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/4829/1 -- To view, visit http://gerrit.cloudera.org:8080/4829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa9dccdcd91297a11a093b921f8d84d32f32be84 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] Minor compute stats script fixes
Internal Jenkins has posted comments on this change. Change subject: Minor compute stats script fixes .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] Minor compute stats script fixes
Internal Jenkins has submitted this change and it was merged. Change subject: Minor compute stats script fixes .. Minor compute stats script fixes * Change run-step to output full log path * Change text to say "Computing table stats" rather than "Computing HBase stats" when running compute-table-stats.sh Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 Reviewed-on: http://gerrit.cloudera.org:8080/4825 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M testdata/bin/create-load-data.sh M testdata/bin/run-step.sh 2 files changed, 3 insertions(+), 4 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/4828 Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. IMPALA-4223: Handle truncated file read from HDFS cache While overwriting files on HDFS via Hive it can happen that Impala sees a partially written, cached file. In these cases we did not correctly handle the partial cached read. This change adds a check and triggers a fall back to disk reads for such errors. If the file is partially written to disk, too, then the query will report a file corruption warning through the disk read path. Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a --- M be/src/runtime/disk-io-mgr-scan-range.cc 1 file changed, 13 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4828/1 -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 4: Also I had a previous question about how COMPUTE STATS behaves. -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 4: (3 comments) Thanks, almost there. You should be able to test the AC tests well w/ exhaustive. I think it should run w/ something like: cd tests impala-py.test --exploration_strategy=exhuastive custom_cluster/test_admission_controller.py That may take a long time. http://gerrit.cloudera.org:8080/#/c/4756/4/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: PS4, Line 609: print(queries_json) remove PS4, Line 613: if query['last_event'] != 'Registered' and \ : query['last_event'] != 'Planning finished': : assert query['resource_pool'] == self.pool_name : else: : assert query['resource_pool'] == '' or \ : query['resource_pool'] == self.pool_name This else clause seems a bit weird. The logic would probably be easier if flipped, and I think we'd want to split these cases to be more precise about when is it empty vs eq to pool_name. Don't we know it will be empty if it's registered? I guess we don't know if it's in Planning Finished-though based on the points at which this is called, maybe the states are actually stable? e.g. if Registered: # should be empty elif Planning Finished: # TODO - are we sure queries will be in this state when this fn is called in the code below? the states may be stable. if we can get here, then this would be the more specific place we don't know if its empty or set yet else: # now we know it'll be set PS4, Line 620: assert query['stmt_type'] == 'DDL' or query['stmt_type'] == 'SET' In my comment before I meant to give an example rather than something we can assert. There are other types too, i think you can leave out this assertion. The test has SET statements but checking the other cases may be misleading for readers. -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Code-Review+1 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 2: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/352/ -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: (1 comment) I added a test. I also had to rebase onto Dimitris' change, so now the diff between PS 9 and 10 is a bit messy. Sorry for that. http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 635: |--02:AGGREGATE [FINALIZE] > Sorry just thought of this. Can you also add a test like this that has a WH Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#10). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 417 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/10 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Bharath Vissapragada has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 2: Sure I can take over. Most of these can be fixed using "Quick Fix" by the IDE. I'll take a look. -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Alex Behm has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 2: Code-Review-1 Do you want to take over the patch Bharath? Please keep in mind that some code (even if unused) is still useful, for example, validateValueTransferGraph(). -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Bharath Vissapragada has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 1: The above ones I pasted are apart from invertOuterJoinState() you removed here. -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Bharath Vissapragada has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 1: I ran the code analysis tool from my IDE and found huge amount of un-used code in the frontend. I haven't verified each item but Analyzer class shows the following ones. - isSubquery() - resetSubquery() - containsOuterjoinedTid(List) - getEquivSlots(SlotId) - removeRedundantExprs(List) - isConjunctAssigned(Expr) - hasUnassignedConjuncts() - getTargetDbName(FunctionName) - getConjunct(ExprId) - getEqJoinConjuncts() - validateValueTransferGraph) As per the analysis there are 4533 such warnings across the code :) -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 2: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/348/ -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Alex Behm has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 8: Code-Review+2 (1 comment) rebased and addressed dan's comment. carrying his +2. http://gerrit.cloudera.org:8080/#/c/4728/7/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS7, Line 145: the > is this talking about the "Kudu client" or "Kudu server"? Done -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency
Matthew Jacobs has uploaded a new patch set (#3). Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency .. IMPALA-3788: Support for Kudu 'read-your-writes' consistency Kudu provides an API to get/set a 'latest observed timestamp' on clients to allow a client which inserts to capture and send this timestamp to another client before a read to ensure that data as of that timestamp is visible. This adds support for this feature _for reads within a session_ by capturing the latest observed timestamp when the KuduTableSink is sending its last update to the coordinator. The timestamp is sent with other post-write information, and is aggregated (i.e. taking the max) at the coordinator. The max is stored in the session, and that value is then set in the Kudu client on future scans. This is being tested by running the Kudu tests after removing delays that were introduced to work around the issue that reads might not be visible after a write. Before this change, if there were no delay, inconsistent results could be returned. Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd --- M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-table-sink.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-exec-state.cc M common/thrift/ImpalaInternalService.thrift M tests/common/impala_test_suite.py M tests/query_test/test_kudu.py 11 files changed, 55 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4779/3 -- To view, visit http://gerrit.cloudera.org:8080/4779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4728 to look at the new patch set (#8). Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 --- M be/src/exec/data-sink.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/runtime/coordinator.cc M be/src/service/impala-beeswax-server.cc M be/src/service/query-exec-state.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M shell/impala_client.py M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test M tests/beeswax/impala_beeswax.py 14 files changed, 248 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4728/8 -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Tim Armstrong has posted comments on this change. Change subject: Removed dead join inversion code from Analyzer. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Clean up union-node.h/cc to enable improvements.
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Clean up union-node.h/cc to enable improvements. .. Patch Set 2: This patch requires the fix for IMPALA-4336. See: https://gerrit.cloudera.org/#/c/4815/ I'll hold off until that one gets reviewed/in. -- To view, visit http://gerrit.cloudera.org:8080/4817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS6, Line 330: kudu_error_counter_->value() > Good point, but this kinda means that IGNORE is buggy (at least theoretical True. I think we just need to make sure the Kudu folks don't change the behavior without telling us so we can work with them to make sure that won't happen in practice, e.g. I could imagine a few things to consider if they want to set error buffer limits so we can make sure the buffer is big enough to handle at least all conflicts. Or they offer a way to report errors different for different types. Ideally we could have a test case that ensures we don't hit any unexpected code paths, I'm not sure we can guarantee any test would overflow errors if they change the behavior in the future. I do have a test case that will result in key conflicts for >7000 rows, but depending on how they could change the code that might or might not result in errors overflowing. I'll start a conversation with them about this. -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Clean up union-node.h/cc to enable improvements.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3586: Clean up union-node.h/cc to enable improvements. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/4756/3/be/src/scheduling/query-schedule.cc File be/src/scheduling/query-schedule.cc: Line 56: is_admitted_(false) { > was this intentional? i don't see why this is necessary Done http://gerrit.cloudera.org:8080/#/c/4756/3/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 900: // TODO-MT: call AdmitQuery() rather than always setting is_admitted > call AdmitQuery() rather than always setting is_admitted Done http://gerrit.cloudera.org:8080/#/c/4756/3/be/src/scheduling/simple-scheduler.h File be/src/scheduling/simple-scheduler.h: Line 92: virtual Status Schedule(QuerySchedule* schedule); > Sorry, on second thought this doesn't add much. There's a comment in the ba Done http://gerrit.cloudera.org:8080/#/c/4756/3/be/src/service/query-exec-state.h File be/src/service/query-exec-state.h: PS3, Line 158: and had the pool set yet, > , not had the pool set yet, or this is StmtType doesn't go through admissio Done http://gerrit.cloudera.org:8080/#/c/4756/3/tests/common/impala_service.py File tests/common/impala_service.py: PS3, Line 83: sleep(interval) : assert 0, 'Metric value %s did not reach value %s in %ss' %\ : (metric_name, expected_value, timeout) > can you move this up to be under read_debug_webpage and call it get_debug_w Done http://gerrit.cloudera.org:8080/#/c/4756/3/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: PS3, Line 603: > outdated Done PS3, Line 658: > counts from the queries page Done -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage
Thomas Tauber-Marshall has uploaded a new patch set (#4). Change subject: IMPALA-1169: Admission control info on the queries debug webpage .. IMPALA-1169: Admission control info on the queries debug webpage This patch adds a new event, 'Queued', to the query event log to indicate when a query is queued by the admission controller. This means that queries on the '/queries' page that are currently queued will display this as their 'Last Event', making it possible to see which queries are current queued. It also adds a column to show the resource pool associated with the queries. Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/simple-scheduler.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-exec-state.h M tests/common/impala_service.py M tests/custom_cluster/test_admission_controller.py M www/queries.tmpl 9 files changed, 75 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4756/4 -- To view, visit http://gerrit.cloudera.org:8080/4756 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Jim Apple has posted comments on this change. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Dan Hecht has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS6, Line 330: > Yes, we still fail the query if the error buffer overflows since we can't b Good point, but this kinda means that IGNORE is buggy (at least theoretically). i.e. a user could get into a situation where they can't run a query because the number of to be ignored errors is too large, and that seems bad. -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] Removed dead join inversion code from Analyzer.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4827 Change subject: Removed dead join inversion code from Analyzer. .. Removed dead join inversion code from Analyzer. Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java 1 file changed, 0 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/4827/1 -- To view, visit http://gerrit.cloudera.org:8080/4827 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9d8fff29c0f6b239796561c877acc709a178c108 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-3586: Clean up union-node.h/cc to enable improvements.
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Clean up union-node.h/cc to enable improvements. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4817/1/be/src/exec/union-node.cc File be/src/exec/union-node.cc: PS1, Line 72: for (int i = 0; i < const_expr_lists_.size(); ++i) { : RETURN_IF_ERROR(Expr::Prepare( : const_expr_lists_[i], state, row_desc(), expr_mem_tracker())); : AddExprCtxsToFree(const_expr_lists_[i]); : DCHECK_EQ(const_expr_lists_[i].size(), tuple_desc_->slots().size()); : } : : // Prepare result expr lists. : for (int i = 0; i < child_expr_lists_.size(); ++i) { : RETURN_IF_ERROR(Expr::Prepare( : child_expr_lists_[i], state, child(i)->row_desc(), expr_mem_tracker())); : AddExprCtxsToFree(child_expr_lists_[i]); : DCHECK_EQ(child_expr_lists_[i].size(), tuple_desc_->slots().size()); : } > ranged-for loops as well? Done for the first loop. The second loop needs child(i)->row_desc(). PS1, Line 220: for (int i = 0; i < const_expr_lists_.size(); ++i) { : Expr::Close(const_expr_lists_[i], state); : } : for (int i = 0; i < child_expr_lists_.size(); ++i) { : Expr::Close(child_expr_lists_[i], state); : } > While you're here, can you rewrite as two ranged-for loops? Done http://gerrit.cloudera.org:8080/#/c/4817/1/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 58: /// Const exprs materialized by this node. These exprs don't refer to any children. > Add that these are only materialized by the first fragment instance to avoi Done -- To view, visit http://gerrit.cloudera.org:8080/4817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Clean up union-node.h/cc to enable improvements.
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Clean up union-node.h/cc to enable improvements. .. Patch Set 1: Code-Review+1 Carry Henry's +1 -- To view, visit http://gerrit.cloudera.org:8080/4817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Jim Apple has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 2: Code-Review+1 Not +2 because I think it could use a review from someone who knows Cmake much better than I do -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Clean up union-node.h/cc to enable improvements.
Hello Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4817 to look at the new patch set (#2). Change subject: IMPALA-3586: Clean up union-node.h/cc to enable improvements. .. IMPALA-3586: Clean up union-node.h/cc to enable improvements. This patch does not address IMPALA-3586, but it cleans up the code in union-node.h/cc to make it easier to implement those perf improvements. The major simplification is to remove conjunct evaluation since the planner does not assigns conjuncts to a union-node anymore. Conjuncts are always pushed to the union operands. Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h 2 files changed, 95 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/4817/2 -- To view, visit http://gerrit.cloudera.org:8080/4817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5fc23985e8d51acb8a6920717ce4e2f0254fe70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS6, Line 330: kudu_error_counter_->value() > Does that reasoning hold up even if all errors are IGNORE type errors and w Yes, we still fail the query if the error buffer overflows since we can't be sure that there weren't more serious errors missed. Regardless, I looked at the client code and it never even fails right now, it just keeps a vector of errors that can keep growing, not bounded. -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Jim Apple has posted comments on this change. Change subject: IMPALA-3676: Use clang as a static analysis tool .. Patch Set 7: (8 comments) > (8 comments) > > The commit message doesn't really tell the full story here. In > particular, you should say something about AlignmentNew and what it > is for. Done > > The rearrangement of class members to group by type seems like it > sacrifices readability for performance in some strange cases - the > statestore is a good example, where there is only ever one > statestore and it is not (as far as we know) highly sensitive to > its in-memory layout. What criteria did you use to decide that a > class' members should be grouped-by-type? I rearranged them all. Tim convinced me to roll some back. http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/common/init.cc File be/src/common/init.cc: PS6, Line 100: [[noreturn]] > can this go on the previous line? It might be easier to read that way (simi While it can, clang-format "fixes" it to this right away. http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 173: : > Why remove these comments? Accident. PS6, Line 176: > why remove this? Put it back in the definition in the class. http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: PS6, Line 182: ec > can you replace that with HLL_PRECISION? Done http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: PS6, Line 140: uniqu > remove std:: in cc files Removed here and elsewhere, except for std::move. I see that usually referred to with the prefix by C++ people. Thoughts? http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS6, Line 82: CacheLineAligned > why? subscriber_heartbeat_threadpool_ is a ThreadPool, and ThreadPool::work_queue_ is a BlockingQueue, and BlcokingQueue is marked CACHE_ALIGNED, presumably to avoid false sharing. I made CacheLineAligned use the C++11 alignas specifier and deleted CACHE_ALIGNED. PS6, Line 253: boost::mutex exit_flag_lock_; > This is a good example of where logical grouping is broken by grouping-by-t rolled back http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/util/aligned-new.h File be/src/util/aligned-new.h: PS6, Line 27: Objects that should be allocate > What does 'must be allocated' mean - for correctness or performance or both either - performance for BlockingQueue and correctness for SIMD values when using aligned loads or stores. -- To view, visit http://gerrit.cloudera.org:8080/4758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Dan Hecht has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 7: (2 comments) Code looks fine, but want to close on the question at line 330 before +2ing. http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS6, Line 330: > I think so: If that happens we'll fail the query. If the query fails, the ' Does that reasoning hold up even if all errors are IGNORE type errors and we overflow the client error buffer? http://gerrit.cloudera.org:8080/#/c/4728/7/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS7, Line 145: Kudu is this talking about the "Kudu client" or "Kudu server"? -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4758 to look at the new patch set (#7). Change subject: IMPALA-3676: Use clang as a static analysis tool .. IMPALA-3676: Use clang as a static analysis tool This patch adds a script to run clang-tidy over the whole code base. It is a first step towards running clang-tidy over patches as a tool to help users spot bugs before code review. Because of the number of clang-tidy checks, this patch only addresses some of them. In particular, only checks starting with 'clang' are considered. Many of them which are flaky or not part of our style are excluded from the analysis. This patch also exlcudes some checks which are part of our current style but which would be too laborious to fix over the entire codebase, like using nullptr rather than NULL. This patch also fixes a number of small bugs found by clang-tidy. Finally, this patch adds the class AlignedNew, the purpose of which is to provide correct alignment on heap-allocated data. The global new operator only guarantees 16-byte alignment. A class that includes a member variable that must be aligned on a k-byte boundary for k>16 can inherit from AlignedNew to ensure correct alignment on the heap, quieting clang's -Wover-aligned warning. (Static and stack allocation are required by the standard to respect the alignment of the type and its member variables, so no extra code is needed for allocation in those places.) Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 --- A .clang-tidy M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/benchmarks/in-predicate-benchmark.cc M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/codegen/codegen-anyval.cc M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/common/compiler-util.h M be/src/common/init.cc M be/src/common/logging.h M be/src/common/status.h M be/src/exec/delimited-text-parser.cc M be/src/exec/delimited-text-parser.h M be/src/exec/exec-node.h M be/src/exec/external-data-source-executor.h M be/src/exec/hash-table-test.cc M be/src/exec/hbase-scan-node.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-table-writer.h M be/src/exec/parquet-column-readers.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scanner-context.inline.h M be/src/exec/write-stream.h M be/src/experiments/bit-stream-utils.8byte.inline.h M be/src/experiments/tuple-splitter-test.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/compound-predicates.h M be/src/exprs/expr-test.cc M be/src/exprs/slot-ref.cc M be/src/exprs/string-functions-ir.cc M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/rpc/thrift-server.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/exec-env.h M be/src/runtime/free-pool-test.cc M be/src/runtime/hbase-table.cc M be/src/runtime/hdfs-fs-cache.h M be/src/runtime/multi-precision-test.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.h M be/src/runtime/scoped-buffer.h M be/src/runtime/string-buffer.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/simple-scheduler-test.cc M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M be/src/service/query-result-set.cc M be/src/statestore/failure-detector.h M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udas.h M be/src/transport/TSasl.h M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.h M be/src/udf/uda-test-harness.h M be/src/udf/uda-test.cc M be/src/udf/udf-ir.cc M be/src/udf/udf.cc M be/src/udf/udf.h M be/src/udf_samples/uda-sample.cc A be/src/util/aligned-new.h M be/src/util/benchmark-test.cc M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util-test.cc M be/src/util/blocking-queue-test.cc M be/src/util/blocking-queue.h M be/src/util/bloom-filter-test.cc M be
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Matthew Jacobs has uploaded a new patch set (#7). Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 --- M be/src/exec/data-sink.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/runtime/coordinator.cc M be/src/service/impala-beeswax-server.cc M be/src/service/query-exec-state.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M shell/impala_client.py M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test M tests/beeswax/impala_beeswax.py 14 files changed, 248 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4728/7 -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Add distcc infrastructure.
Jim Apple has posted comments on this change. Change subject: Add distcc infrastructure. .. Patch Set 3: Code-Review+2 Carry Tim's +2 -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Huaisi Xu has uploaded a new patch set (#17). Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. IMPALA-1702: Enforce single-table consistency in query analysis. Catalogd managed table id generation causes problems when new updates arrive at frontend while queries are being analyzed: references to the same table may suddenly refer to a different version, and different tables may share the same table id. This causes problems when one table overrides another one in the thrift descriptor table sent to backend. This commit removes the table id from the catalog Table object; instead frontend assigns a unique id to each table in DescriptorTable.toThrift(). It also implements a referencedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same reference for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 27 files changed, 204 insertions(+), 219 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/17 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4349 to look at the new patch set (#17). Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. IMPALA-1702: Enforce single-table consistency in query analysis. Catalogd managed table id generation causes problems when new updates arrive at frontend while queries are being analyzed: references to the same table may suddenly refer to a different version, and different tables may share the same table id. This causes problems when one table overrides another one in the thrift descriptor table sent to backend. This commit removes the table id from the catalog Table object; instead frontend assigns a unique id to each table in DescriptorTable.toThrift(). It also implements a referencedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same reference for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 27 files changed, 204 insertions(+), 219 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/17 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. Patch Set 16: (10 comments) Thanks Marcel. http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 294: // The Impalad Catalog has the latest tables from the statestore. In order to use the > add todo that we need to investigate whether to do this for other catalog o Done http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 43: * them unique ids. > "may be null" is a better phrasing. Done Line 44: */ > "Set in QueryStmt.analyze()." Done Line 45: public class DescriptorTable { > remove last sentence, that goes without saying. Done Line 179: // inline view of a non-constant select has a non-materialized tuple descriptor > "checking that" Done Line 180: // in the descriptor table just for type checking, which we need to skip > "same Table instance" is even more specific (because we're talking about ob Done Line 202: for (SlotDescriptor slotD: tupleDesc.getMaterializedSlots()) { > single line Done http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 201: public void materializeSlots() { > single line done. but not sure if this is what you meant. http://gerrit.cloudera.org:8080/#/c/4349/15/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: Line 438: if (descTbl.isSetTupleDescriptors()) { > precede w/ blank line Done Line 447: if (execRequest.isSetFragments() && !execRequest.fragments.isEmpty()) { > same here Done -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4342: Use sync ddl=1 for test udf errors
Alex Behm has posted comments on this change. Change subject: IMPALA-4342: Use sync_ddl=1 for test_udf_errors .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4792/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: Line 83: self.run_test_case('QueryTest/udf-errors', vector) > Does it mean that the 2 DDL statements will go to the same coordinator with Yes. As discussed, this looks like a product bug similar to IMPALA-3888. -- To view, visit http://gerrit.cloudera.org:8080/4792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9063e9c80833904d26dd06e694bc3158a04acee2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4309: Introduce Expr rewrite phase and supporting classes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4309: Introduce Expr rewrite phase and supporting classes. .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java: Line 37 I removed this test until we agree on the approach first. I'll see how we can resurrect this test later (if possible). -- To view, visit http://gerrit.cloudera.org:8080/4746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4309: Introduce Expr rewrite phase and supporting classes.
Alex Behm has uploaded a new patch set (#6). Change subject: IMPALA-4309: Introduce Expr rewrite phase and supporting classes. .. IMPALA-4309: Introduce Expr rewrite phase and supporting classes. Introduces a new phase for rewriting Exprs after analysis and before subquery rewriting. The transformed Exprs replace the original ones in analyzed statements. If Exprs were changed, the whole statement is reset() and re-analyzed, similar to how subqueries are rewritten. If both Exprs and subqueries are rewritten there is only one re-analysis of the changed statement. The following new classes work together to perform transformations: 1. ExprRewriteRule - base class for Expr transformation rules 2. ExprRewriter - drives the transformation of Exprs using a list of ExprRewriteRules Statements that have exprs to be rewritten need to implement a new method rewriteExprs() that accepts an ExprRewriter. As an example, this patch adds a rule for converting BetweenPredicates into their equivalent CompoundPredicates. The BetweenPredicate has been notoriously buggy due to a lack of such a separate rewrite phase and is now cleaned up. Testing: 1. Added a new test for checking that the rewrite framework covers all relevant statements, clauses and can properly handle nested statements and subqueries. 2. There are many existing tests for BetweePredicates and they all exercise the new rewrite rule/phase. 3. Ran a private core/hdfs run and it passed. Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java A fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java A fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java A fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 29 files changed, 513 insertions(+), 216 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/4746/6 -- To view, visit http://gerrit.cloudera.org:8080/4746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Tim Armstrong has uploaded a new patch set (#2). Change subject: Add all build targets to CMake and speed up builds .. Add all build targets to CMake and speed up builds Use CMake's dependency resolution always instead of serial execution of targets via shell scripts. This improves parallelism by building fe, be, and other targets at the same time and avoid some overhead from invoking "make" multiple times. This reduces the time taken for an incremental compilation of fe and be from 56s to 24s with this command: ./buildall.sh -debug -noclean -notests -skiptests -ninja Also use Impala-lzo's build script. This depends on the IMPALA-4277 fixes to the Impala-lzo build script. Also make sure that "make" builds all the same artifacts as buildall.sh when run with no args. Testing: Ran a jenkins core job, also experimented locally. Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 --- M CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/service/CMakeLists.txt M bin/make_impala.sh M buildall.sh M ext-data-source/CMakeLists.txt M fe/CMakeLists.txt 7 files changed, 65 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/4790/2 -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4309: Introduce Expr rewrite phase and supporting classes.
Alex Behm has posted comments on this change. Change subject: IMPALA-4309: Introduce Expr rewrite phase and supporting classes. .. Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: Line 355 > what was going on with this? This was stale code from the first version of subquery rewriting which was later cleaned up. It's not needed. Line 379: analysisResult_.stmt_.rewriteExprs(rewriter_); > for the between transformation it's not necessary, but in general i'd expec I was thinking that we'd cross that bridge when our set of rewrite rules becomes complex enough to warrant that, but easy enough to do now. Done. http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java File fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java: Line 56: "supported in a between predicate: " + toSqlImpl()); > capitalize between Done http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: Line 59: protected Expr havingClause_; // original having clause > why move it? Reverted. Line 874: // Also rewrite exprs in the statements of subqueries. > the problem you have is that the parent/child expr tree doesn't give you ac Expr.getContainedExprs() does not help because we also need to set the rewritten exprs in the appropriate clauses of the contained QueryStmt. Sure, we can get the exprs and rewrite them, but then we don't know where to set them. We could make everything Reference. Let me know if you want to go down the "everything is a ParseNode path" and I'll abandon this patch. That will be much more involved change and I personally don't think it's worth it. That change will save us these < 50 LOC to traverse the stmts and I don't see us rewriting non-Expr ParseNodes with that new infra (we'll likely keep our existing StmtRewriter). http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: Line 69 > huh? Moved to caller to avoid several reset() and reanalyze() passes. http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: Line 38: numChanges_ = 0; > should rules really have state? 1. Sure, the driver can keep track of changes instead, but it seems like we'll need excessive object creation: for (Rule rule: rules) { Expr clone = expr.clone(); clone = rule.apply(clone); changed = clone.equals(expr); } I made those changes. 2. I think there are some rewrite rules that are way easier to express if the rule itself can traverse the entire Expr tree. For example, removing redundant conjuncts or disjuncts seems a little cumbersome in your proposed approach. Seems like we'd need to have a normalization pass first to get the redundant exprs next to each other so a single rule application can detect the redundancy. Line 46: if (expr instanceof BetweenPredicate) { > if (!(expr instanceof ...)) return expr; Done http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java File fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java: Line 40: public abstract Expr rewrite(Expr expr, Analyzer analyzer) throws AnalysisException; > call it apply then? Done http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java: Line 31: public class ExprRewriter extends ExprRewriteRule { > what's the point of making it a subclass of ExprRewriteRule? 1. No need to make subclass ExprRewriteRule. Done. 2. I don't think we want to hardcode the rules in here because we will want to do different sets of rewrites in different phases of planning. For example, the BETWEEN rewrite is a logical rewrite that we can do on the parse "tree". But constant folding will need to be done on the fully substituted Exprs, i.e., during planning. Of course, we can just brute-force it and always apply everything but I think it's easier to understand if we have a more nuanced application of rules. 3. I made the other changed you requested here: * Keep applying rules until there no more changes. * Have the driver traverse the expr tree and detect changes. http://gerrit.cloudera.org:8080/#/c/4746/5/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: Line 186: // Do not analyze the stmt to avoid applying rewrit
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 2: Code-Review+2 Rebase, carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. IMPALA-4339: ensure coredumps end up in IMPALA_HOME Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e --- M buildall.sh 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/4785/2 -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] Add distcc infrastructure.
Internal Jenkins has posted comments on this change. Change subject: Add distcc infrastructure. .. Patch Set 2: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/346/ -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 2: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/347/ -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4728 to look at the new patch set (#7). Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. IMPALA-4134,IMPALA-3704: Kudu INSERT improvements 1.) IMPALA-4134: Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704: Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 --- M be/src/exec/data-sink.cc M be/src/exec/hbase-table-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/runtime/coordinator.cc M be/src/service/impala-beeswax-server.cc M be/src/service/query-exec-state.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M shell/impala_client.py M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test M tests/beeswax/impala_beeswax.py 14 files changed, 248 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/4728/7 -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. IMPALA-3211: provide toolchain build id for bootstrapping Testing: Ran a private build, which succeeded. Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 14 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4771/3 -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Hello Michael Brown, Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4771 to look at the new patch set (#3). Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. IMPALA-3211: provide toolchain build id for bootstrapping Testing: Ran a private build, which succeeded. Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 14 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4771/3 -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 1: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/343/ -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4771/2/bin/impala-config.sh File bin/impala-config.sh: Line 56: : ${IMPALA_TOOLCHAIN_BUILD_ID=249-2267164200} > This could use a bit more explanation. How should a person discover this va Done -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. Patch Set 16: solved conflict first. addressing comment next -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Huaisi Xu has uploaded a new patch set (#16). Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. IMPALA-1702: Enforce single-table consistency in query analysis. Catalogd managed table id generation causes problems when new updates arrive at frontend while queries are being analyzed: references to the same table may suddenly refer to a different version, and different tables may share the same table id. This causes problems when one table overrides another one in the thrift descriptor table sent to backend. This commit removes the table id from the catalog Table object; instead frontend assigns a unique id to each table in DescriptorTable.toThrift(). It also implements a referencedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same reference for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 27 files changed, 199 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/16 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce single-table consistency in query analysis.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4349 to look at the new patch set (#16). Change subject: IMPALA-1702: Enforce single-table consistency in query analysis. .. IMPALA-1702: Enforce single-table consistency in query analysis. Catalogd managed table id generation causes problems when new updates arrive at frontend while queries are being analyzed: references to the same table may suddenly refer to a different version, and different tables may share the same table id. This causes problems when one table overrides another one in the thrift descriptor table sent to backend. This commit removes the table id from the catalog Table object; instead frontend assigns a unique id to each table in DescriptorTable.toThrift(). It also implements a referencedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same reference for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 27 files changed, 199 insertions(+), 213 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/16 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Tim Armstrong has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/4790/1/bin/make_impala.sh File bin/make_impala.sh: Line 90: echo "[-fe_only] : Builds fe only." > The reason for this flag is so that make_impala.sh still builds the backend A lot of people depend on make_impala.sh only building the backend unfortunately (I know of several jenkins jobs). Doesn't seem worth the pain, especially since buildall.sh already was a standard way to build everything. PS1, Line 92: ." > cand benchmarks Done PS1, Line 96: like > What else? There used to be a test tarball. We might want to produce a source tarball at some point. Mainly just didn't want to add a new flag every time we added some new build artifact. PS1, Line 174: exit > This line can be omitted Done http://gerrit.cloudera.org:8080/#/c/4790/1/buildall.sh File buildall.sh: PS1, Line 305: all > Should the meaning of "all" be put into make_impala.sh? I think it makes more sense to do that at the CMake level. It already builds most of these things when run with no args, but I added in the remaining ones that were not part of ALL (cscope and the tarballs). -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4153: Fix count(*) on all blank('') columns - test
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4153: Fix count(*) on all blank('') columns - test .. Patch Set 1: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/342/ -- To view, visit http://gerrit.cloudera.org:8080/4755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23923f95f43d67977ee1520a1fc09ce297548b3f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS6, Line 145: . > why? (that's the more interesting part for the code reader since the code We want to end up with the ability to have multiple buffers where we start flushing one or more of them and continuing to write into new ones. This is because a buffer that is flushing must block and cannot be freed/re-used again until all pending ops in it complete. The issue of why set 100mb total mutation space w/ a low threshold took a while for me to get enough info, but it turns out the API is just confusing. Unfortunately there are a few ways to specify the same behavior. Since we set the total buffer space and the flush watermark pct, that implies a number of max buffers (100/7). We could also set the max number of buffers and pick different values for the total buffer space / flush watermark pct to accomplish the same thing. This is a known issue and they're looking to improve it moving forward. I'll update the comment. Line 282: << "previous write operation might be inconsistent.\n"; > can't the write operation be inconsistent even if there was no overflow? i. Yeah I agree this is poorly worded, I think it'd just be better to take out the 2nd part. Line 298: error_msg_buffer << "Kudu error(s) reported, first error: " << e.ToString(); > why not just create the Status here since we no longer will append to it la Done PS6, Line 330: kudu_error_counter_->value() > this doesn't really work in case of GetPendingErrors() returning overflow, I think so: If that happens we'll fail the query. If the query fails, the 'num_modified_rows' never gets put in the profile (Coordinator::Wait()) or returned via beeswax (impala-beeswax-server CloseInsertInternal()) -- both ignore the insert stats when query status is not ok. http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: PS6, Line 48: due to a > can't it return an error for other reasons? Done PS6, Line 49: be reported as warnings. > isn't the first Kudu error also reported as the error status? Done PS6, Line 113: if the Kudu client may be : /// getting behind. > what does this mean? does the code use it, or is this saying a user could u It's meant to be used as a profile counter, not for the code. I tried to clarify. http://gerrit.cloudera.org:8080/#/c/4728/6/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS6, Line 300: Error Kudu table > should this say "Error in Kudu table ..." or something? Done -- To view, visit http://gerrit.cloudera.org:8080/4728 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 845 insertions(+), 561 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/7 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 845 insertions(+), 561 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/7 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) > If I understand it correctly, this initialization is done at the initFn() o We could do that but it doesn't seem worth it (the interpreted path has enough additional overhead that I doubt it makes a measurable difference). PS6, Line 1617: Value* dst_ptr > May help to explain what dst_ptr actually is ? It seems to be pointer to th Done PS6, Line 1631: (agg_op == AggFnEvaluator::MIN || agg_op == AggFnEvaluator::MAX)) > nit: I find it easier to read to check the agg_op first: Done PS6, Line 1639: slot_desc->CodegenSetNullIndicator > Can this be skipped if (!slot_desc->is_nullable()) ? Same for below. Min/Max should always be nullable (if there are no input values, the output is NULL). Added a DCHECK to assert this. PS6, Line 1663: agg_op == AggFnEvaluator::MIN > agg_op != OTHER ? or you wanna keep this list for clarity ? Yeah I prefer this way, also since it won't break if someone adds a new aggregate op PS6, Line 1718: false); > Why not just pass true and skip calling CloneFunction() below ? I somehow missed that we called Clonefunction() below. PS6, Line 1742: Value* dst_lowered_ptr = dst.GetLoweredPtr("dst_lowered_ptr"); : Type* dst_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType(codegen, dst_type); : Value* dst_unlowered_ptr = builder->CreateBitCast( : dst_lowered_ptr, dst_unlowered_ptr_type, "dst_unlowered_ptr"); > Why not dst.GetUnloweredPtr() ? We use both dst_unlowered_ptr and dst_lowered_ptr so we need both variables. We can't just call both GetLoweredPtr() and GetUnloweredPtr() here because dst_lowered_ptr and dst_unlowered_ptr need to point to the same alloca(). -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has posted comments on this change. Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. Patch Set 3: Code-Review+2 Carry Tim's +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4794 to look at the new patch set (#3). Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() This change implements support for TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue(). TimestampValue itself is 16 bytes in size. To match RawValue::Write() in the interpreted path, CodegenAssignNullValue() emits code to assign HashUtil::FNV_SEED to both the upper and lower 64-bit of the destination value. This change also fixes the handling of 128-bit Decimal16Value in CodegenAssignNullValue() so the emitted code matches the behavior of the interpreted path. Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 --- M be/src/codegen/codegen-anyval.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hash-table.cc M be/src/exec/old-hash-table.cc M be/src/util/bit-util.h M testdata/workloads/functional-query/queries/QueryTest/joins.test 7 files changed, 72 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4794/3 -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Minor compute stats script fixes
Alex Behm has posted comments on this change. Change subject: Minor compute stats script fixes .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3884: Support TYPE TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
Michael Ho has posted comments on this change. Change subject: IMPALA-3884: Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue() .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4794/2/be/src/util/bit-util.h File be/src/util/bit-util.h: PS2, Line 85: conste > constexpr - that way we can use it in static_asserts() etc in future. Done -- To view, visit http://gerrit.cloudera.org:8080/4794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0211d38cbef46331e0006fa5ed0680e6e0867bc8 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4339: ensure coredumps end up in IMPALA HOME
Dan Hecht has posted comments on this change. Change subject: IMPALA-4339: ensure coredumps end up in IMPALA_HOME .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] Minor compute stats script fixes
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4825 Change subject: Minor compute stats script fixes .. Minor compute stats script fixes * Change run-step to output full log path * Change text to say "Computing table stats" rather than "Computing HBase stats" when running compute-table-stats.sh Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 --- M testdata/bin/create-load-data.sh M testdata/bin/run-step.sh 2 files changed, 3 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/4825/1 -- To view, visit http://gerrit.cloudera.org:8080/4825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I326f4c370fda8d5e388af8e2395623185c06bc07 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 635: # IMPALA-2521: clustered insert into non-partitioned table does not add sort node. Sorry just thought of this. Can you also add a test like this that has a WHERE-clause subquery? We should test the reset() + analyze() path when subqueries are rewritten. -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 9: (6 comments) Thanks for the comments, please see PS9. http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 602: } > Thanks. The reason why it works without is that SortNode.init() during plan Ok Line 604: if (matchFound && table_ instanceof KuduTable) { > I vote for keep Ok http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 603: // Store exprs for Kudu key columns. > add line breaks between if blocks, it's harder to navigate without those (' Done http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 237:* outputs of aggregation. > something missing there. (i know, not your fault.) Yes, I couldn't figure out in git blame what that sentence originally meant. Since I don't understand enough of the code to complete it, I removed it. Let me know if I should replace it with something else. http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 247: # IMPALA-2521: clustered insert into table adds sort node, correctly substituting exprs. > you can't always see the correct substitution until you run the query (beca Done Line 249: select id, name, maxzip as zip > use fewer columns, add an analytic function in the inline view Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#9). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 367 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/9 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] Add distcc infrastructure.
Tim Armstrong has posted comments on this change. Change subject: Add distcc infrastructure. .. Patch Set 2: Would be good to send out an email to dev@ once this goes in -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add distcc infrastructure.
Tim Armstrong has posted comments on this change. Change subject: Add distcc infrastructure. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4300: Speed up BloomFilter::Or with SIMD
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4300: Speed up BloomFilter::Or with SIMD
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD .. IMPALA-4300: Speed up BloomFilter::Or with SIMD The previous code was not written in a way that GCC could auto-vectorize it. Manually vectorizing speeds up BloomFilter::Or by up to 184x. Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Reviewed-on: http://gerrit.cloudera.org:8080/4813 Reviewed-by: Jim Apple Tested-by: Internal Jenkins --- M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/testutil/mem-util.h M be/src/util/bloom-filter.cc M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 5 files changed, 178 insertions(+), 75 deletions(-) Approvals: Jim Apple: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Michael Ho has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 6: (7 comments) Looking good. Some more comments for now. Still doing another pass. http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) If I understand it correctly, this initialization is done at the initFn() of various aggregate operators (e.g. initNull() and initZero()) so this is even true for the interpreted path, right ? Not sure why we cannot do that even for the interpreted path too ? PS6, Line 1617: Value* dst_ptr May help to explain what dst_ptr actually is ? It seems to be pointer to the slot for this agg_fn_evaluator in the intermediate tuple PS6, Line 1631: (agg_op == AggFnEvaluator::MIN || agg_op == AggFnEvaluator::MAX)) nit: I find it easier to read to check the agg_op first: if ((agg_op == ..) && dst_is_numeric_or_bool)) { ... } PS6, Line 1639: slot_desc->CodegenSetNullIndicator Can this be skipped if (!slot_desc->is_nullable()) ? Same for below. PS6, Line 1663: agg_op == AggFnEvaluator::MIN agg_op != OTHER ? or you wanna keep this list for clarity ? PS6, Line 1718: false); Why not just pass true and skip calling CloneFunction() below ? PS6, Line 1742: Value* dst_lowered_ptr = dst.GetLoweredPtr("dst_lowered_ptr"); : Type* dst_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType(codegen, dst_type); : Value* dst_unlowered_ptr = builder->CreateBitCast( : dst_lowered_ptr, dst_unlowered_ptr_type, "dst_unlowered_ptr"); Why not dst.GetUnloweredPtr() ? -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Henry Robinson has posted comments on this change. Change subject: IMPALA-3676: Use clang as a static analysis tool .. Patch Set 6: (8 comments) The commit message doesn't really tell the full story here. In particular, you should say something about AlignmentNew and what it is for. The rearrangement of class members to group by type seems like it sacrifices readability for performance in some strange cases - the statestore is a good example, where there is only ever one statestore and it is not (as far as we know) highly sensitive to its in-memory layout. What criteria did you use to decide that a class' members should be grouped-by-type? http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/common/init.cc File be/src/common/init.cc: PS6, Line 100: [[noreturn]] can this go on the previous line? It might be easier to read that way (similar to how we do template<> declarations). http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 173: : Why remove these comments? PS6, Line 176: why remove this? http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: PS6, Line 182: 10 can you replace that with HLL_PRECISION? http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: PS6, Line 140: std:: remove std:: in cc files http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS6, Line 82: CacheLineAligned why? PS6, Line 253: boost::mutex exit_flag_lock_; This is a good example of where logical grouping is broken by grouping-by-type. The lock and the flag are related, but are not next to each other. http://gerrit.cloudera.org:8080/#/c/4758/6/be/src/util/aligned-new.h File be/src/util/aligned-new.h: PS6, Line 27: Objects that must be allocated What does 'must be allocated' mean - for correctness or performance or both? -- To view, visit http://gerrit.cloudera.org:8080/4758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3211: provide toolchain build id for bootstrapping
Jim Apple has posted comments on this change. Change subject: IMPALA-3211: provide toolchain build id for bootstrapping .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4771/2//COMMIT_MSG Commit Message: PS2, Line 10: cceeded I think the reason Gerrit displays these in blue while the first two letters are black is because "cceeded" is a 7-character git hash :-) http://gerrit.cloudera.org:8080/#/c/4771/2/bin/impala-config.sh File bin/impala-config.sh: Line 56: : ${IMPALA_TOOLCHAIN_BUILD_ID=249-2267164200} This could use a bit more explanation. How should a person discover this value? When should it be changed? -- To view, visit http://gerrit.cloudera.org:8080/4771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes