[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5553: Fix expr-test in release builds .. IMPALA-5553: Fix expr-test in release builds expr-test fails in release build as it uses Literal::CreateLiteral() to create literal expressions. Literal::CreateLiteral() wraps ParseString() with DCHECK() so it becomes a no-op in release builds. This change fixes the issue by moving Literal::CreateLiteral() from literal.cc to expr-test.cc as it's only used by the test code. Also replaces DCHECK() wrapped around ParseString() with EXPECT_TRUE(). Verified the fix by building and running a release build of expr-test. Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Reviewed-on: http://gerrit.cloudera.org:8080/7255 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exprs/expr-test.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h 3 files changed, 114 insertions(+), 102 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7255 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5553: Fix expr-test in release builds .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7255 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5553: Fix expr-test in release builds .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/778/ -- To view, visit http://gerrit.cloudera.org:8080/7255 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. IMPALA-5548 Fix some minor nits with HDFS parquet column readers Replace some std::map instances with unordered maps. std::map is unnecessary in many cases where an unordered map should suffice, and in almost all circumstances, perform better. Also discovered was a slightly dangerous initialization of an empty NullOffsetIndicator, which could conceivably result in undefined behavior by writing arr[ofs] |= b, where ofs was mistakenly initialized to -1 (and b to 0, so such behavior may not be detected, but could cause a crash by underrunning a buffer). Also add warn unused to status results while we are changing this. Testing: Running exhaustive tests against a Jenkins build. Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Reviewed-on: http://gerrit.cloudera.org:8080/7240 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.h M be/src/runtime/descriptors.h 5 files changed, 22 insertions(+), 19 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4703: reservation denial debug action
Dan Hecht has posted comments on this change. Change subject: IMPALA-4703: reservation denial debug action .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Michael Ho has posted comments on this change. Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS3, Line 252: batch->compressed_tuple_data > The ownership is shared with the batch object. AddSidecar() internally move I see. I am still getting used to this subtly of passing shared_ptr as argument. -- To view, visit http://gerrit.cloudera.org:8080/7103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Zach Amsden has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: Line 91: protected List> fkPkEqJoinConjuncts_; Class member is only used in one function in the base class. Maybe document that this is preserved just for printing values and wrap it with a getter so it can't be modified? Line 253: return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, rhsCard); This computes estimates based on both PK conjuncts and non-PK conjuncts using the same formula, instead of filtering out non-PK joins. If we have a RHS which is an extremely selective non-PK, this could bias the cardinality estimate to a very low value, since we take the minimum computed cardinality from these assumed PK values. Is there any value in preserving the >> construction, or should getFkPkEqJoinConjuncts consider the conjuncts by tid order and just return a flattened list (which can be passed to getFkPkJoinCardinality) Line 275: for (List fkPkCandi: scanSlotsByTids.values()) { fkPkCandi - this could use a better name, it isn't clear to me what this is trying to describe. -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden .. IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden Tested by exporting a different IMPALA_LOGS_DIR path, then confirming that it was not changed by sourcing impala-config.sh. Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4 Reviewed-on: http://gerrit.cloudera.org:8080/7197 Reviewed-by: David Knupp Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified David Knupp: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Henry Robinson has abandoned this change. Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. Abandoned Bad merge - will restore when I fix it. -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Hello Impala Public Jenkins, Michael Ho, Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7226 to look at the new patch set (#5). Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization Change allocation pattern for Codec objects in RowBatch to be stack-allocated. Make c'tors and Init() methods of codec implementations publicly visible in order to do so. Refactor HdfsParquetWriter etc. to deal properly with Status returned in all cases. Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d --- M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/experiments/compression-test.cc M be/src/runtime/row-batch.cc M be/src/util/codec.h M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/decompress.h 12 files changed, 160 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7226/5 -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. IMPALA-5549: Remove deprecated fields from CatalogService API Remove from TCatalogUpdateResult the deprecated fields that are no longer used and modify the catalog to always populate the 'updated_catalog_objects' and 'removed_catalog_object' fields with the updated/removed catalog objects. Testing: Run core tests. Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Reviewed-on: http://gerrit.cloudera.org:8080/7248 Reviewed-by: Dimitris Tsirogiannis Tested-by: Impala Public Jenkins --- M be/src/service/impala-server.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 22 insertions(+), 103 deletions(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired in Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open() * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java A fe/src/main/java/org/apache/impala/planner/PlanVisitor.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 38 files changed, 2,767 insertions(+), 384 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/6 -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Alex Behm has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: (18 comments) http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG Commit Message: PS3, Line 15: a plan node > I hear your concerns and I do think we should think it through carefully. I Unless we are sure this is a common user scenario, I lean towards a default value of 0. I don't think the risk of regressions is worth the unknown benefit. I agree with Tim that the behavior is easy to explain, but it still means users might have to deal with tuning this value after upgrading. http://gerrit.cloudera.org:8080/#/c/7153/3/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 497: query_options->__set_disable_codegen_rows_threshold(atoi(value.c_str())); Consider using StringParser::StringToInt() like in other places here, so we can properly validate the input. http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 255: // If the number of rows per node is below the threshold codegen will be automatically If the number of rows processed per node ... (to make it clear that these are the input rows and not the output rows) http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 279: // If the number of rows per node is below the threshold and disable_codegen is unset, If the number of rows processed per node http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 151: Needs PlannerTests. You can do something like PlannerTest.testComputeStatsMtDop(). Line 152: checkForDisableCodegen(rootFragment.getPlanRoot()); Add comment why we have to do this on the fragmented plan Line 477: boolean isSmallQuery = visitor.valid() && visitor.getMaxRowsProcessed() < threshold; remove visitor.valid() and inline result into the if Line 492: private void checkForDisableCodegen(PlanNode planRoot) { planRoot -> distributedPlan Line 493: // Determine the maximum number of rows processed by a instance of a plan node. This comments reads like the comment on MaxRowsProcessedVisitor.maxRowsProcessed_. I think we need crisper explanations for the two maximum numbers. http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 32: // True if we should abort because if we don't have valid estimates remove second "if" Line 47: if (caller instanceof HashJoinNode || caller instanceof NestedLoopJoinNode) { if (caller instanceof JoinNode) foundJoinNode_ = true; Line 60: || (missingStats && !scan.hasLimit() && scan.getConjuncts().isEmpty())) { Why would the existence of scan conjuncts make our estimate valid despite missing stats? Line 66: Math.max(maxRowsProcessedPerNode_, numRows / numNodes); Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows. Line 77: Math.max(maxRowsProcessedPerNode_, numRows / numNodes); Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows. Line 81: public boolean valid() { single line Line 96: return foundJoinNode_; Preconditions.checkState(valid_); seems appropriate here as well http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: QUERY Any reason not to make these PlannerTests? http://gerrit.cloudera.org:8080/#/c/7153/3/tests/common/test_dimensions.py File tests/common/test_dimensions.py: Line 139: disable_codegen_rows_threshold_options=[5000], Why not 0? For the same reason you use 0 below. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: https://gerrit.cloudera.org/#/c/7257/ will also help make the primary key detection more reliable! -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [SECURITY] Add security library to build .. Patch Set 10: Michael - this includes the KRB5 and OpenSSL linking fixes. -- To view, visit http://gerrit.cloudera.org:8080/5717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG Commit Message: PS3, Line 15: a plan node > Multi-column PK is not uncommon. Even with accurate stats, Planner could ge I hear your concerns and I do think we should think it through carefully. I also think we shouldn't be too scared to change defaults when the current defaults don't make really sense. I'd struggle to explain to someone why Impala spends multiple seconds doing codegen when the planner estimates show that it's a small query that will take 10s or 100s of milliseconds to execute. It's easy to explain that Impala decided to disable codegen because the estimates showed it to be the right decision (but the estimates were wrong). If it's a bad fit for a particular workload it's possible to disable this or change the threshold. The current threshold is pretty safe according to my experiments. The motivation is that now if you have a workload with a mix of big and small queries, you get stuck in a situation where the small queries are burning lots of CPU doing unnecessary codegen. The only workaround is to manually disable codegen for the small queries, but that's not really practical to do in most cases. RE: disabling it for joins. Part of the motivation was that the small query optimisation doesn't work for joins. It was disabled because of a bug with num_nodes=1. RE: explain plans. It does show up in profiles with a string like "codegen disabled due to optimization hints" for the plan nodes. I could add it to explain plans too but it wasn't clear if it was worth adding even more detail to them. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC
Henry Robinson has uploaded a new patch set (#15). Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC .. IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC This patch adds the core abstractions necessary for using Kudu RPC for Impala's backend services. The StatestoreService and StatestoreSubscriberService are both ported to KRPC to demonstrate how the new RPC framework operates. The main new class is RpcMgr, which is responsible for both services run by a daemon and for obtaining clients to remote services. The new Rpc class helps clients build rpc invocation objects, and use them to call remote methods. Also done: * Utility code to convert from a Thrift structure to a Protobuf wrapper. Thrift RPCs do not need to be ported to Protobuf to work. * Added more build support for Protobuf generation. All library targets now depend on the Protobuf generation step, just as they do Thrift. * thrift-server-test is rewritten to use Beeswax as a test service, since the old StatestoreService has been removed. * Remove InProcessStatestore that was only used for statestore-test and mini-impala-cluster. As far as we know, mini-impala-cluster is unused by anyone. (IMPALA-4784) TESTING: * Impala's test suite passes. * statestore-test has been rewritten to use new statestore code. It includes all test cases from the now-removed Python test_statestore.py * rpc-mgr-test unit-tests the RpcMgr and Rpc classes. TODO: * Enable SSL and Kerberos support (already in Kudu's imported libraries) Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/catalog-op-executor.cc M be/src/exec/kudu-util.h M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/rpc/CMakeLists.txt A be/src/rpc/common.proto A be/src/rpc/rpc-mgr-test.cc A be/src/rpc/rpc-mgr.cc A be/src/rpc/rpc-mgr.h A be/src/rpc/rpc-mgr.inline.h A be/src/rpc/rpc.h A be/src/rpc/rpc_test.proto M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-util.h M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/service/client-request-state.cc M be/src/statestore/CMakeLists.txt D be/src/statestore/statestore-service-client-wrapper.h D be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h A be/src/statestore/statestore.proto M be/src/statestore/statestored-main.cc M be/src/testutil/CMakeLists.txt M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h D be/src/testutil/mini-impala-cluster.cc M be/src/transport/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/collection-metrics.h M be/src/util/counting-barrier.h M bin/start-impala-cluster.py M bin/start-impalad.sh M common/thrift/CMakeLists.txt M common/thrift/StatestoreService.thrift D tests/statestore/test_statestore.py 53 files changed, 1,733 insertions(+), 1,219 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/15 -- To view, visit http://gerrit.cloudera.org:8080/5720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4669: [KRPC] Add kudu rpc library to build
Henry Robinson has uploaded a new patch set (#10). Change subject: IMPALA-4669: [KRPC] Add kudu_rpc library to build .. IMPALA-4669: [KRPC] Add kudu_rpc library to build Import FindKRPC.cmake from Apache Kudu. One minor linking change to krpc/CMakeLists.txt. Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc --- M CMakeLists.txt M be/CMakeLists.txt A cmake_modules/FindKRPC.cmake 3 files changed, 129 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/5719/10 -- To view, visit http://gerrit.cloudera.org:8080/5719 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6473 to look at the new patch set (#8). Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs .. IMPALA-4889: Use client sidecars for Thrift RPCs This patch changes the way Thrift structures are serialized with KRPC. Instead of serializing them to a byte stream, and then writing that stream to a Protobuf object which is serialized again en route to the wire, the Thrift objects are serialized to byte streams which are then directly written to the wire as a 'sidecar'. This saves a copy and serialization step both on the sender and receiver sides of the RPC. Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 --- M be/src/rpc/common.proto M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc.h M be/src/rpc/thrift-util.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc 7 files changed, 135 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/6473/8 -- To view, visit http://gerrit.cloudera.org:8080/6473 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Henry Robinson has posted comments on this change. Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 5: (22 comments) PS4 is a rebase. PS5 includes the review responses (so diff 4->5 if you want to see what changed). http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/rpc/rpc.h File be/src/rpc/rpc.h: PS3, Line 106: Ownership is : // shared by the caller, and the RPC subsystem > Doesn't std::move transfer the ownership so the caller no longer shares the The shared_ptr is copied in. It's the copy that is then moved into the sidecar list. PS3, Line 143: are owned by the caller > the ownership is temporarily transferred to the RPC call when this function I don't think so - the RPC call has pointers, but doesn't have ownership in the sense that it has no responsibility for managing a reference count or freeing the memory. PS3, Line 147: > Having the names 'func', 'cb' and 'cb_wrapper' all close by each other make Done PS3, Line 153: > Does this move mean that the params_ member is invalid after this call? If Done PS3, Line 327: > Maybe name this 'completion_cb'. Done http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS3, Line 389: equest->mutable_bloom_filter()->set_log_heap_space(0); : request->mutable_bloom_filter()->set_directory_sidecar_idx(-1); : } > Why wouldn't a move capture ensure the same thing? proto_filter is a const shared_ptr&. You can't move from it. Instead, we could have the argument be shared_ptr, and move from it here; they're basically equivalent, it's just a question of where you make the copy. http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS3, Line 1207: VLOG_QUERY << "Not enough memory to allocate filter: " : << PrettyPrinter::Print(heap_space, TUnit::BYTES) : << " (query: " << coord->query_id() << ")"; : // Disable, as one missing update means a correct filter cannot be > I would add this to the commit message. This means we would take double the I don't think so - because params.directory is a sidecar I don't think it's been copied since it arrived on the socket. In the Thrift case, the bytestream had to be deserialized into a TBloomFilter. That's what's happening here - the equivalent 'deserialization' step. This path should only get taken the first time a filter arrives, and it does briefly keep two filters around (the sidecar should get destroyed as soon as the RPC is responded to). http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-recvr.cc File be/src/runtime/data-stream-recvr.cc: PS3, Line 280: blocked_senders_.front() > Is this a right way to dispose a unique_ptr? Good point - release() is clearer, and get() may have been a benign bug. http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS3, Line 58: > Not used. Done PS3, Line 60: > Not used. Done PS3, Line 133: scoped_ptr batch_; > No one really calls GetNumDataBytesSent() (except from our BE test). So, I' We're gaining correctness - so worth doing (otherwise if someone decides to use it in the future, they might run into problems). PS3, Line 148: > A reader of this code might not immediately understand why this class needs I expanded the comment here. PS3, Line 170: > Why is this set in Init()? Wouldn't it ideally be set it in the constructor Moved to c'tor. PS3, Line 175: proto_batch_idx_ > Just want to make sure that this will increase the shared_ptr refcount? It Yep - this was a mistake. Removed auto to make it more explicit. PS3, Line 203: co > Prefer a more descriptive name "rpc_completion_cb" or something similar. Done PS3, Line 214: ck_guard > channel == nullptr Done PS3, Line 252: batch->tuple_data, &idx); > Is this transferring the ownership to the RPC subsystem ? AddSideCar() inte The ownership is shared with the batch object. AddSidecar() internally moves from the argument, which is a copy (i.e. its own reference). PS3, Line 266: .release(), rpc_complete_callback); > This is a subtle change in behavior from previous Impala version. In partic Any reasonably conservative timeout runs the risk of false negatives if a sender is blocked. I agree with your analysis about this being a change in behaviour. In practice, though, here's what I hope will happen: if one write to a node is slow enough to previously trigger the timeout, I would expect the statestore RPCs to also go slow (and they will time out); the node will be marked as offline and the query will be cancelled. If there is a situation where this RPC only is slow in writing (but all other RPCs to the server are ok), then I agree
[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build
Henry Robinson has uploaded a new patch set (#9). Change subject: IMPALA-4669: [SECURITY] Add security library to build .. IMPALA-4669: [SECURITY] Add security library to build * Minor compilation fix * Set toolchain version to include gflags 2.2.0 and glog 0.3.4-p2 Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/kudu/security/init.cc M be/src/kudu/security/token_verifier.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh M cmake_modules/FindKerberos.cmake 7 files changed, 58 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5717/9 -- To view, visit http://gerrit.cloudera.org:8080/5717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Henry Robinson has uploaded a new patch set (#5). Change subject: IMPALA-4856: Port data stream service to KRPC .. IMPALA-4856: Port data stream service to KRPC This patch ports the data-flow parts of ImpalaInternalService to KRPC. * ImpalaInternalService is split into two services. The first, ImpalaInternalService, deals with control messages for plan fragment instance execution, cancellation and reporting, and remains implemented in Thrift for now. The second, DataStreamService, handles large-payload RPCs for transmitting runtime filters and row batches between hosts. * In the DataStreamService, all RPCs use 'native' protobuf. The DataStreamService starts on the port previously reserved for the StatestoreSubscriberService (which is also a KRPC service), to avoid having to configure another port when starting Impala. When the ImpalaInternalService is ported to KRPC, all services will run on one port. * To support needing to address two different backend services, a data service port has been added to TBackendDescriptor. * This patch adds support for asynchronous RPCs to the RpcMgr and Rpc classes. Previously, Impala used fixed size thread pools + synchronous RPCs to achieve some parallelism for 'broadcast' RPCs like filter propagation, or a dedicated per-sender+receiver pair thread on the sender side in the DataStreamSender case. In this patch, the PublishFilter() and TransmitData() RPCs are sent asynchronously using KRPC's thread pools. * The TransmitData() protocol has changed to adapt to asynchronous RPCs. The full details are in data-stream-mgr.h. * As a result, DataStreamSender no longer creates a thread-per-connection on the sender side. * Both tuple transmission and runtime filter publication use sidecars to minimise the number of copies and serialization steps required. * Also include a fix for KUDU-2011 that properly allows sidecars to be shared between KRPC and the RPC caller (fixing IMPALA-5093, a corruption bug). * A large portion of this patch is the replacement of TRowBatch with its Protobuf equivalent, RowBatchPb. The replacement is a literal port of the data structure, and row-batch-test, row-batch-list-test and row-batch-serialize-benchmark continue to execute without major logic changes. * Simplify FindRecvr() logic in DataStreamManager. No-longer need to handle blocking sender-side, so no need for complex promise-based machinery. Instead, all senders with no receiver are added to a per-receiver list, which is processed when the receiver arrives. If it does not arrive promptly, the DataStreamManager cleans them up after FLAGS_datastream_sender_timeout_ms. * This patch also begins a clean-up of how ImpalaServer instances are created (by removing CreateImpalaServer), and clarifying the relationship between ExecEnv and ImpalaServer. ImpalaServer now follows the standard construct->Init()->Start()->Join() lifecycle that we use for other services. * Ensure that all addresses used for KRPCs are fully resolved, avoiding the need to resolve them for each RPC. TESTING --- * New tests added to rpc-mgr-test. TO DO - * Re-enable throughput and latency measurements per data-stream sender when that information is exposed from KRPC (KUDU-1738). * TLS and Kerberos are still not supported by KRPC in this patch. Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b --- M .clang-format M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/common/init.cc M be/src/common/status.cc M be/src/common/status.h M be/src/exprs/expr-test.cc M be/src/kudu/rpc/rpc_sidecar.cc M be/src/kudu/rpc/rpc_sidecar.h M be/src/rpc/CMakeLists.txt M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/common.proto M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc-mgr.inline.h M be/src/rpc/rpc.h M be/src/runtime/backend-client.h M be/src/runtime/client-cache-types.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/data-stream-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/scheduling/backend-config-test.cc M be/src/scheduling/back
[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build
Henry Robinson has uploaded a new patch set (#10). Change subject: IMPALA-4669: [SECURITY] Add security library to build .. IMPALA-4669: [SECURITY] Add security library to build * Minor compilation fix * Set toolchain version to include gflags 2.2.0 and glog 0.3.4-p2 * Look for krb5 headers in toolchain (but dynamically link against libraries) Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/kudu/security/init.cc M be/src/kudu/security/token_verifier.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh M cmake_modules/FindKerberos.cmake 7 files changed, 58 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5717/10 -- To view, visit http://gerrit.cloudera.org:8080/5717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8
Henry Robinson has uploaded a new patch set (#10). Change subject: IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8 .. IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8 Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c --- A be/src/kudu/rpc/CMakeLists.txt A be/src/kudu/rpc/acceptor_pool.cc A be/src/kudu/rpc/acceptor_pool.h A be/src/kudu/rpc/blocking_ops.cc A be/src/kudu/rpc/blocking_ops.h A be/src/kudu/rpc/client_negotiation.cc A be/src/kudu/rpc/client_negotiation.h A be/src/kudu/rpc/connection.cc A be/src/kudu/rpc/connection.h A be/src/kudu/rpc/constants.cc A be/src/kudu/rpc/constants.h A be/src/kudu/rpc/exactly_once_rpc-test.cc A be/src/kudu/rpc/inbound_call.cc A be/src/kudu/rpc/inbound_call.h A be/src/kudu/rpc/messenger.cc A be/src/kudu/rpc/messenger.h A be/src/kudu/rpc/mt-rpc-test.cc A be/src/kudu/rpc/negotiation-test.cc A be/src/kudu/rpc/negotiation.cc A be/src/kudu/rpc/negotiation.h A be/src/kudu/rpc/outbound_call.cc A be/src/kudu/rpc/outbound_call.h A be/src/kudu/rpc/protoc-gen-krpc.cc A be/src/kudu/rpc/proxy.cc A be/src/kudu/rpc/proxy.h A be/src/kudu/rpc/reactor-test.cc A be/src/kudu/rpc/reactor.cc A be/src/kudu/rpc/reactor.h A be/src/kudu/rpc/remote_method.cc A be/src/kudu/rpc/remote_method.h A be/src/kudu/rpc/remote_user.cc A be/src/kudu/rpc/remote_user.h A be/src/kudu/rpc/request_tracker-test.cc A be/src/kudu/rpc/request_tracker.cc A be/src/kudu/rpc/request_tracker.h A be/src/kudu/rpc/response_callback.h A be/src/kudu/rpc/result_tracker.cc A be/src/kudu/rpc/result_tracker.h A be/src/kudu/rpc/retriable_rpc.h A be/src/kudu/rpc/rpc-bench.cc A be/src/kudu/rpc/rpc-test-base.h A be/src/kudu/rpc/rpc-test.cc A be/src/kudu/rpc/rpc.cc A be/src/kudu/rpc/rpc.h A be/src/kudu/rpc/rpc_context.cc A be/src/kudu/rpc/rpc_context.h A be/src/kudu/rpc/rpc_controller.cc A be/src/kudu/rpc/rpc_controller.h A be/src/kudu/rpc/rpc_header.proto A be/src/kudu/rpc/rpc_introspection.proto A be/src/kudu/rpc/rpc_service.h A be/src/kudu/rpc/rpc_sidecar.cc A be/src/kudu/rpc/rpc_sidecar.h A be/src/kudu/rpc/rpc_stub-test.cc A be/src/kudu/rpc/rpcz_store.cc A be/src/kudu/rpc/rpcz_store.h A be/src/kudu/rpc/rtest.proto A be/src/kudu/rpc/rtest_diff_package.proto A be/src/kudu/rpc/sasl_common.cc A be/src/kudu/rpc/sasl_common.h A be/src/kudu/rpc/sasl_helper.cc A be/src/kudu/rpc/sasl_helper.h A be/src/kudu/rpc/serialization.cc A be/src/kudu/rpc/serialization.h A be/src/kudu/rpc/server_negotiation.cc A be/src/kudu/rpc/server_negotiation.h A be/src/kudu/rpc/service_if.cc A be/src/kudu/rpc/service_if.h A be/src/kudu/rpc/service_pool.cc A be/src/kudu/rpc/service_pool.h A be/src/kudu/rpc/service_queue-test.cc A be/src/kudu/rpc/service_queue.cc A be/src/kudu/rpc/service_queue.h A be/src/kudu/rpc/transfer.cc A be/src/kudu/rpc/transfer.h A be/src/kudu/rpc/user_credentials.cc A be/src/kudu/rpc/user_credentials.h 77 files changed, 20,264 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/5718/10 -- To view, visit http://gerrit.cloudera.org:8080/5718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5715 to look at the new patch set (#19). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt A be/src/common/kudu_version.cc M be/src/common/logging.cc M be/src/exec/kudu-util.h A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/env_posix.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/locks.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc D cmake_modules/FindOpenSSL.cmake 18 files changed, 140 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/19 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC
Henry Robinson has uploaded a new patch set (#14). Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC .. IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC This patch adds the core abstractions necessary for using Kudu RPC for Impala's backend services. The StatestoreService and StatestoreSubscriberService are both ported to KRPC to demonstrate how the new RPC framework operates. The main new class is RpcMgr, which is responsible for both services run by a daemon and for obtaining clients to remote services. The new Rpc class helps clients build rpc invocation objects, and use them to call remote methods. Also done: * Utility code to convert from a Thrift structure to a Protobuf wrapper. Thrift RPCs do not need to be ported to Protobuf to work. * Added more build support for Protobuf generation. All library targets now depend on the Protobuf generation step, just as they do Thrift. * thrift-server-test is rewritten to use Beeswax as a test service, since the old StatestoreService has been removed. * Remove InProcessStatestore that was only used for statestore-test and mini-impala-cluster. As far as we know, mini-impala-cluster is unused by anyone. (IMPALA-4784) TESTING: * Impala's test suite passes. * statestore-test has been rewritten to use new statestore code. It includes all test cases from the now-removed Python test_statestore.py * rpc-mgr-test unit-tests the RpcMgr and Rpc classes. TODO: * Enable SSL and Kerberos support (already in Kudu's imported libraries) Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/catalog/CMakeLists.txt M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/codegen/CMakeLists.txt M be/src/common/CMakeLists.txt M be/src/exec/CMakeLists.txt M be/src/exec/catalog-op-executor.cc M be/src/exec/kudu-util.h M be/src/experiments/CMakeLists.txt M be/src/exprs/CMakeLists.txt M be/src/rpc/CMakeLists.txt A be/src/rpc/common.proto A be/src/rpc/rpc-mgr-test.cc A be/src/rpc/rpc-mgr.cc A be/src/rpc/rpc-mgr.h A be/src/rpc/rpc-mgr.inline.h A be/src/rpc/rpc.h A be/src/rpc/rpc_test.proto M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-util.h M be/src/runtime/CMakeLists.txt M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/CMakeLists.txt M be/src/service/CMakeLists.txt M be/src/service/client-request-state.cc M be/src/statestore/CMakeLists.txt D be/src/statestore/statestore-service-client-wrapper.h D be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h A be/src/statestore/statestore.proto M be/src/statestore/statestored-main.cc M be/src/testutil/CMakeLists.txt M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h D be/src/testutil/mini-impala-cluster.cc M be/src/transport/CMakeLists.txt M be/src/udf/CMakeLists.txt M be/src/udf_samples/CMakeLists.txt M be/src/util/CMakeLists.txt M be/src/util/collection-metrics.h M be/src/util/counting-barrier.h M bin/start-impala-cluster.py M bin/start-impalad.sh M common/thrift/CMakeLists.txt M common/thrift/StatestoreService.thrift D tests/statestore/test_statestore.py 53 files changed, 1,733 insertions(+), 1,219 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/14 -- To view, visit http://gerrit.cloudera.org:8080/5720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Henry Robinson has uploaded a new patch set (#4). Change subject: IMPALA-4856: Port data stream service to KRPC .. IMPALA-4856: Port data stream service to KRPC This patch ports the data-flow parts of ImpalaInternalService to KRPC. * ImpalaInternalService is split into two services. The first, ImpalaInternalService, deals with control messages for plan fragment instance execution, cancellation and reporting, and remains implemented in Thrift for now. The second, DataStreamService, handles large-payload RPCs for transmitting runtime filters and row batches between hosts. * In the DataStreamService, all RPCs use 'native' protobuf. The DataStreamService starts on the port previously reserved for the StatestoreSubscriberService (which is also a KRPC service), to avoid having to configure another port when starting Impala. When the ImpalaInternalService is ported to KRPC, all services will run on one port. * To support needing to address two different backend services, a data service port has been added to TBackendDescriptor. * This patch adds support for asynchronous RPCs to the RpcMgr and Rpc classes. Previously, Impala used fixed size thread pools + synchronous RPCs to achieve some parallelism for 'broadcast' RPCs like filter propagation, or a dedicated per-sender+receiver pair thread on the sender side in the DataStreamSender case. In this patch, the PublishFilter() and TransmitData() RPCs are sent asynchronously using KRPC's thread pools. * The TransmitData() protocol has changed to adapt to asynchronous RPCs. The full details are in data-stream-mgr.h. * As a result, DataStreamSender no longer creates a thread-per-connection on the sender side. * Both tuple transmission and runtime filter publication use sidecars to minimise the number of copies and serialization steps required. * Also include a fix for KUDU-2011 that properly allows sidecars to be shared between KRPC and the RPC caller (fixing IMPALA-5093, a corruption bug). * A large portion of this patch is the replacement of TRowBatch with its Protobuf equivalent, RowBatchPb. The replacement is a literal port of the data structure, and row-batch-test, row-batch-list-test and row-batch-serialize-benchmark continue to execute without major logic changes. * Simplify FindRecvr() logic in DataStreamManager. No-longer need to handle blocking sender-side, so no need for complex promise-based machinery. Instead, all senders with no receiver are added to a per-receiver list, which is processed when the receiver arrives. If it does not arrive promptly, the DataStreamManager cleans them up after FLAGS_datastream_sender_timeout_ms. * This patch also begins a clean-up of how ImpalaServer instances are created (by removing CreateImpalaServer), and clarifying the relationship between ExecEnv and ImpalaServer. ImpalaServer now follows the standard construct->Init()->Start()->Join() lifecycle that we use for other services. * Ensure that all addresses used for KRPCs are fully resolved, avoiding the need to resolve them for each RPC. TESTING --- * New tests added to rpc-mgr-test. TO DO - * Re-enable throughput and latency measurements per data-stream sender when that information is exposed from KRPC (KUDU-1738). * TLS and Kerberos are still not supported by KRPC in this patch. Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b --- M .clang-format M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/common/init.cc M be/src/common/status.cc M be/src/common/status.h M be/src/exprs/expr-test.cc M be/src/kudu/rpc/rpc_sidecar.cc M be/src/kudu/rpc/rpc_sidecar.h M be/src/rpc/CMakeLists.txt M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/common.proto M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc-mgr.h M be/src/rpc/rpc-mgr.inline.h M be/src/rpc/rpc.h M be/src/runtime/backend-client.h M be/src/runtime/client-cache-types.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/data-stream-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/scheduling/backend-config-test.cc M be/src/scheduling/back
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/7257 Change subject: IMPALA-5547: Rework FK/PK join detection. .. IMPALA-5547: Rework FK/PK join detection. Reworks the FK/PK join detection logic to: - more accurately recognize many-to-many joins - avoid dim/dim joins for multi-column PKs The new detection logic maintains our existing philosophy of generally assuming a FK/PK join, unless there is strong evidence to the contrary, as follows. For each set of simple equi-join conjuncts between two tables, we compute the joint NDV of the right-hand side columns by multiplication, and if the joint NDV is significantly smaller than the right-hand side row count, then we are fairly confident that the right-hand side is not a PK. Otherwise, we assume the set of conjuncts could represent a FK/PK relationship. Extends the explain plan to include the outcome of the FK/PK detection at EXPLAIN_LEVEL > STANDARD. Performance testing: 1. Full TPC-DS run on 10TB: - Q10 improved by >100x - Q72 improved by >25x - Q17,Q26,Q29 improved by 2x - Q64 regressed by 10x - Total runtime: Improved by 2x - Geomean: Minor improvement The regression of Q64 is understood and we will try to address it in follow-on changes. The previous plan was better by accident and not because of superior logic. 2. Nightly TPC-H and TPC-DS runs: - No perf differences Testing: - The existing planner test cover the changes. - Code/hdfs run passed. Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 --- M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 17 files changed, 1,500 insertions(+), 1,351 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/1 -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-4703: reservation denial debug action
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-4703: reservation denial debug action .. IMPALA-4703: reservation denial debug action Add debug action to deny reservation increases with some probability. This allows us to test various scenarios, particularly: * The case when the node only gets its initial reservation and must run to completion without increasing its reservation. * The case when there is some memory pressure and the node sometimes gets a reservation increase and sometimes doesn't. E.g. to deny all reservation requests after an ExecNode has opened: set debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0 This was applied to test_spilling. It caught a bug in the PAGG with spilling string aggregations. This required some minor extensions to the debug actions. * Allow debug actions that apply to all ExecNodes if node_id is -1. * Allow passing parameters to debug actions. The current grammar of the actions is not well-oriented towards extension, so I resorted to using @ as a new delimiter. I also optimised ExecDebugAction() so that it is much faster in the common case and extended --disable_mem_pools to prevent the buffer pool from holding onto unused buffers. Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 --- M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/debug-options.cc M be/src/runtime/debug-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M tests/query_test/test_spilling.py 13 files changed, 139 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7022/5 -- To view, visit http://gerrit.cloudera.org:8080/7022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4703: reservation denial debug action
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4703: reservation denial debug action .. Patch Set 3: (8 comments) Rebased onto my latest buffer pool preview patch and addressed comments. http://gerrit.cloudera.org:8080/#/c/7022/1/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: PS1, Line 452: T_ typo http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 439: LOG(INFO) << "Debug action on " << id_; > is that too much noise, or did you find it useful? Removed - missed removing this debugging code. Line 459: debug_action_param_.c_str(), debug_action_param_.size(), &parse_result); > does "1.0" actually produce probability of exactly 1? i.e. is it guaranteed You're right, there was an off-by-one bug that was hit if rand() returns exactly RAND_MAX http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/bufferpool/buffer-pool.h File be/src/runtime/bufferpool/buffer-pool.h: Line 341: /// Call SetDebugDenyIncreaseReservation() on this client's ReservationTracker. > explain the parameters. even after reading the code, I'm not 100% sure what I don't think it's actually necessary, so I removed it. http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: Line 154: if (increase_deny_delay_ == 0) { > is increase_deny_delay_ meant to get reset to an initial count at this poin It was meant to be one-shot. I removed it because the purpose I had in mind no longer applied. http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/debug-options.h File be/src/runtime/debug-options.h: PS3, Line 55: // INVALID: debug options invalid > this comment seems to be superseded by enable(), but action_param_ might be This still seems relevant since enabled() depends on this. Or did it seem redundant? PS3, Line 60: invalid > this is a bit confusing given that INVALID is a phase. maybe explain instea Done http://gerrit.cloudera.org:8080/#/c/7022/1/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 70: // A floating point number in range [0.0, 1.0] that will be the probability of denying Wording. -- To view, visit http://gerrit.cloudera.org:8080/7022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4703: reservation denial debug action
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-4703: reservation denial debug action .. IMPALA-4703: reservation denial debug action Add debug action to deny reservation increases with some probability. This allows us to test various scenarios, particularly: * The case when the node only gets its initial reservation and must run to completion without increasing its reservation. * The case when there is some memory pressure and the node sometimes gets a reservation increase and sometimes doesn't. E.g. to deny all reservation requests after an ExecNode has opened: set debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0 This was applied to test_spilling. It caught a bug in the PAGG with spilling string aggregations. This required some minor extensions to the debug actions. * Allow debug actions that apply to all ExecNodes if node_id is -1. * Allow passing parameters to debug actions. The current grammar of the actions is not well-oriented towards extension, so I resorted to using @ as a new delimiter. I also optimised ExecDebugAction() so that it is much faster in the common case and extended --disable_mem_pools to prevent the buffer pool from holding onto unused buffers. Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 --- M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/debug-options.cc M be/src/runtime/debug-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M tests/query_test/test_spilling.py 13 files changed, 139 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7022/4 -- To view, visit http://gerrit.cloudera.org:8080/7022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool
Tim Armstrong has uploaded a new patch set (#21). Change subject: PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool .. PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) The BlockingJoinNode code is also fixed so that resources are claimed *after* the build side is opened in the case when there's an async build thread. This is required for some bushy plans, e.g. TPC-H Q 22. TODO in this patch: * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size * Add maximum row size and minimum page size query options (dependent on planner changes). TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream Testing: * Updated tests to reflect new memory requirements * TODO: recalibrate limits in test_mem_usage_scaling after planner min reservation changes Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/nested-loop-join-builder.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/CMakeLists.txt D be/src/runtime/buffered-block-mgr-test.cc D be/src/runtime/buffered-block-mgr.cc D be/src/runtime/buffered-block-mgr.h D be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h D be/src/runtime/buffered-tuple-stream.cc D be/src/runtime/buffered-tuple-stream.h D be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc A be/src/runtime/initial-reservations.cc A be/src/runtime/initial-reservations.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.h M be/src/util/static-asserts.cc M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_
[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-4674: Part 1: remove old aggs and joins .. IMPALA-4674: Part 1: remove old aggs and joins This is intended to be merged at the same time as Part 2 but is separated out to make the change more reviewable. Part 2 assumes that it does not need special logic to handle this mode (e.g. because the old aggs and joins don't use reservation). Disable the --enable_partitioned_{aggregation,hash_join} options and remove all product and test code associated with them. Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt D be/src/exec/aggregation-node-ir.cc D be/src/exec/aggregation-node.cc D be/src/exec/aggregation-node.h M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/exec-node.cc D be/src/exec/hash-join-node-ir.cc D be/src/exec/hash-join-node.cc D be/src/exec/hash-join-node.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/nested-loop-join-node.cc M be/src/exec/nested-loop-join-node.h D be/src/exec/old-hash-table-ir.cc D be/src/exec/old-hash-table-test.cc D be/src/exec/old-hash-table.cc D be/src/exec/old-hash-table.h D be/src/exec/old-hash-table.inline.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exprs/agg-fn-evaluator.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java D testdata/workloads/functional-query/queries/QueryTest/legacy-joins-aggs.test M tests/common/environ.py M tests/common/skip.py D tests/custom_cluster/test_legacy_joins_aggs.py M tests/metadata/test_ddl.py M tests/query_test/test_aggregation.py M tests/query_test/test_join_queries.py M tests/query_test/test_mt_dop.py M tests/query_test/test_nested_types.py M tests/query_test/test_runtime_filters.py M tests/query_test/test_scanners.py M tests/query_test/test_tpch_nested_queries.py 41 files changed, 31 insertions(+), 4,352 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/7102/4 -- To view, visit http://gerrit.cloudera.org:8080/7102 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7254/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 275: while (!output_iterator_.AtEnd()) { > Should this be inside the if() statement? Since it references dummy_dst. It doesn't matter too much so long as we can convince ourselves it's correct since this code is near EOL. -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5553: Fix expr-test in release builds .. Patch Set 1: Code-Review+2 Thanks for the cleanup -- To view, visit http://gerrit.cloudera.org:8080/7255 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7254/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 275: while (!output_iterator_.AtEnd()) { Should this be inside the if() statement? Since it references dummy_dst. -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/7255 Change subject: IMPALA-5553: Fix expr-test in release builds .. IMPALA-5553: Fix expr-test in release builds expr-test fails in release build as it uses Literal::CreateLiteral() to create literal expressions. Literal::CreateLiteral() wraps ParseString() with DCHECK() so it becomes a no-op in release builds. This change fixes the issue by moving Literal::CreateLiteral() from literal.cc to expr-test.cc as it's only used by the test code. Also replaces DCHECK() wrapped around ParseString() with EXPECT_TRUE(). Verified the fix by building and running a release build of expr-test. Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d --- M be/src/exprs/expr-test.cc M be/src/exprs/literal.cc M be/src/exprs/literal.h 3 files changed, 114 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7255/1 -- To view, visit http://gerrit.cloudera.org:8080/7255 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/7254 Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails .. IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails A recent change moved the initialization of output_tuple_desc_ to the constructor of AggregationNode. This changes the order in which tuple_pool_ and output_tuple_desc_ are initialized. The code in AggregationNode::Close() made the assumption that tuple_pool_ cannot be NULL (although without a DCHECK) if output_tuple_desc_ is not NULL. This no longer holds in the new code. This change makes AggregationNode::Close() checks tuple_pool_ to see if it's NULL before using it. Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 --- M be/src/exec/aggregation-node.cc 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7254/1 -- To view, visit http://gerrit.cloudera.org:8080/7254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/777/ -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Dan Hecht has posted comments on this change. Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Dan Hecht has posted comments on this change. Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries
Juan Yu has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG Commit Message: PS3, Line 15: a plan node > MaxRowsProcessedVisitor detects when there are missing or corrupt stats and Multi-column PK is not uncommon. Even with accurate stats, Planner could generate wrong plan, accidentally put small table on left side, and underestimate the join cardinality. I maybe too conservative. My philosophy is codegen is good in general, we should disable it only when we are certain it will be slow. when we are not certain, keep it as before. That can reduce the risk of regression. Maybe do the same as the small query optimization, skip join for now until we can do better estimations. Also some logging or indication in plan the codegen is disabled due to this optimization will help troubleshooting. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/776/ -- To view, visit http://gerrit.cloudera.org:8080/7197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7240 to look at the new patch set (#3). Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers .. IMPALA-5548 Fix some minor nits with HDFS parquet column readers Replace some std::map instances with unordered maps. std::map is unnecessary in many cases where an unordered map should suffice, and in almost all circumstances, perform better. Also discovered was a slightly dangerous initialization of an empty NullOffsetIndicator, which could conceivably result in undefined behavior by writing arr[ofs] |= b, where ofs was mistakenly initialized to -1 (and b to 0, so such behavior may not be detected, but could cause a crash by underrunning a buffer). Also add warn unused to status results while we are changing this. Testing: Running exhaustive tests against a Jenkins build. Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.h M be/src/runtime/descriptors.h 5 files changed, 22 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7240/3 -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden
David Knupp has posted comments on this change. Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden .. Patch Set 3: Code-Review+2 Rebase only. Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/7197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5548 Fix some minor issues with HDFS / parquet column readers
Hello Michael Ho, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7240 to look at the new patch set (#2). Change subject: IMPALA-5548 Fix some minor issues with HDFS / parquet column readers .. IMPALA-5548 Fix some minor issues with HDFS / parquet column readers Replace some std::map instances with unordered maps. std::map is unnecessary in many cases where an unordered map should suffice, and in almost all circumstances, perform better. Also discovered was a slightly dangerous initialization of an empty NullOffsetIndicator, which could conceivably result in undefined behavior by writing arr[ofs] |= b, where ofs was mistakenly initialized to -1 (and b to 0, so such behavior may not be detected, but could cause a crash by underrunning a buffer). Also add warn unused to status results while we are changing this. Testing: Running exhaustive tests against a Jenkins build. Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.h M be/src/runtime/descriptors.h 5 files changed, 22 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7240/2 -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5548 Fix some minor issues with HDFS / parquet column readers
Zach Amsden has posted comments on this change. Change subject: IMPALA-5548 Fix some minor issues with HDFS / parquet column readers .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7240/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 352: FileFormatsMap per_type_files_; > we iterate through this one. it doesn't look like the ordering has any inte Done. Not much point in putting this on HdfsFileFormat. Line 464: std::pair, int> FileTypeCountsMap; > we iterate this one to print some info, and it seems like a consistent orde Done -- To view, visit http://gerrit.cloudera.org:8080/7240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5540: Revert Sentry version back to 5.13 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5540: Revert Sentry version back to 5.13 .. IMPALA-5540: Revert Sentry version back to 5.13 Sentry has now fixed the problem on their end, so we can return to using 5.13. Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Reviewed-on: http://gerrit.cloudera.org:8080/7247 Reviewed-by: Henry Robinson Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Henry Robinson: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. Patch Set 5: (5 comments) I found a handful of subtle bugs. Most of them were caught by running tests on my buffer pool development branch and hittings DCHECKs that too many reservations were claimed. The parallel plan bugs were found by inspection: resources required by the parallel plans should always be <= 2x the resources required by the distributed plans. http://gerrit.cloudera.org:8080/#/c/7223/5/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: Line 156: if (status->ok()) *status = AcquireResourcesForBuild(state); I added AcquireResourcesForBuild() to defer consuming resources until the build side is open. Line 169: if (CanCloseBuildEarly() || !status->ok()) child(1)->Close(state); I added this to close the build side ASAP when possible. http://gerrit.cloudera.org:8080/#/c/7223/5/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: Line 170: // Unless we are inside a subplan expecting to call Open()/GetNext() on the child Moving this wasn't strictly necessary but it seems to make sense to close it once we're done with it. http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: Line 103: public boolean isBlockingNode() { return false; } This was also a bug. It's a streaming node! Added a planner test with a pipeline of analytics that makes it obvious. http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: Line 239: if (sink_ instanceof JoinBuildSink) { This was another subtle bug - we were double-counting some fragments in the parallel plans because the join build fragment was included in the parent fragment (there isn't an exchange node separating the fragments)/. -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 6: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/6/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 183: DCHECK(row_batch != nullptr); Brief comment to explain this? It's subtle. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-4862: make resource profile consistent with backend behaviour .. IMPALA-4862: make resource profile consistent with backend behaviour This moves away from the PipelinedPlanNodeSet approach of enumerating sets of concurrently-executing nodes because unions would force creating many overlapping sets of nodes. The new approach computes the peak resources during Open() and the peak resources between Open() and Close() (i.e. while calling GetNext()) bottom-up for each plan node in a fragment. The fragment resources are then combined to produce the query resources. The basic assumptions for the new resource estimates are: * resources are acquired in Open() and released in Close(). * Blocking nodes call Open() on their child before acquiring their own resources (this required some backend changes). * Blocking nodes call Close() on their children before returning from Open() * The peak resource consumption of the query is the sum of the independent fragments (except for the parallel join build plans where we can assume there will be synchronisation). This is conservative but we don't synchronise fragment Open() and Close() across exchanges so can't make stronger assumptions in general. Also compute the sum of minimum reservations. This will be useful in the backend to determine exactly when all of the initial reservations have been claimed from a shared pool of initial reservations. Testing: * Updated planner tests to reflect behavioural changes. * Added extra resource requirement planner tests for unions, subplans, pipelines of blocking operators, and bushy join plans. * Added single-node plans to resource-requirements tests. These have more complex plan trees inside a single fragment, which is useful for testing the peak resource requirement logic. Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java A fe/src/main/java/org/apache/impala/planner/PlanVisitor.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 38 files changed, 2,763 insertions(+), 384 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/5 -- To view, visit http://gerrit.cloudera.org:8080/7223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-3504: UDF for current timestamp in UTC .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7203/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS1, Line 371: a > as -> at I did a second look at this while looking at Dan's latest comment. I think it should remain as "set as the query submission time" since this means that it represents the query submission time. This would also remove the need to specify that it represents the same point in time as now_string -- To view, visit http://gerrit.cloudera.org:8080/7203 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7095 to look at the new patch set (#4). Change subject: IMPALA-5431: Remove redundant path exists checks during table load .. IMPALA-5431: Remove redundant path exists checks during table load There are multiple places that do an exists() check on a path and then perform some subsequent action on it. This pattern results in two RPCs to the NN (one for the exists() check and one for the subsequent action). We can avoid the exists() check in these cases since most HDFS methods on paths throw a FileNotFoundException if the path does not exist. This can save an RPC to NN and improve the metadata loading time. Testing: Enough tests already cover this code path. This patch passed core and exhaustive tests. Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 59 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/4 -- To view, visit http://gerrit.cloudera.org:8080/7095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/775/ -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 3: Code-Review+2 Rebased. Keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. IMPALA-5497: spilling hash joins that output build rows hit OOM The bug is that the join tried to bring the next spilled partition into memory while still holding onto memory from the current partition. The fix is to return earlier if the output batch is at capacity so that resources are flushed. Also reduce some of the redundancy in the loop that drives the spilling logic and catch some dropped statuses.. Testing: The failure was originally reproduced by my IMPALA-4703 patch. I was able to cause a query failure with the current code by reducing the memory limit for an existing query. Before it failed with up to 12MB of memory. Now it succeeds with 8MB or less. Ran exhaustive build. Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Reviewed-on: http://gerrit.cloudera.org:8080/7180 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test 6 files changed, 85 insertions(+), 85 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. Patch Set 3: (7 comments) nice! thanks for adding the test, I think it looks good. I had a few small additional comments, and I realized I think we can simplify disk-io-mgr by removing one of the members. http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr-test.cc File be/src/runtime/disk-io-mgr-test.cc: Line 1142: int num_io_threads_for_remote_disks = FLAGS_num_remote_hdfs_io_threads const PS3, Line 1160: 0 can you also set this to something else, e.g. 100 to show that it gets overwritten by the other parameters? PS3, Line 1165: == weird indentation, just put the operator on the previous line and use 4 spaces to tab http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: PS3, Line 631: /// comment this is for testing only PS3, Line 638: /// - threads_per_rotational_disk: number of read threads to create per rotational : ///disk. This is also the max queue depth. : /// - threads_per_solid_state_disk: number of read threads to create per solid state : ///disk. This is also the max queue depth. these overwrite threads_per_disk PS3, Line 837: num_threads_per_disk_ the naming of this should probably be num_io_threads_per_disk_ to be consistent or better yet, I think we can remove this member altogether because we can just set num_io_threads_per_{rotational,ssd}_ to the value of FLAGS_num_threads_per_disk if the finer grained flags weren't set. http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/util/thread.cc File be/src/util/thread.cc: PS3, Line 335: const nit: space after const -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Dan Hecht has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: Line 886: prefix + "8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_", rather than defining a new builtin, why don't we wrap the sum() with a coalesce or isnull? does that not work for some reason? http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS5, Line 105: The caller passes in a flag to the constructor that indicates if the count(*) : * optimization can be applied to the query block of this scan. This scan node decides : * whether to apply the optimization or not This doesn't make a lot of sense to me. On one hand, it sayst he caller decides, but then it says that this node also decides. What's actually going on? Line 109: * do we actually explain in some comment how this optimization works at a high level? PS5, Line 140: This : // scan does additional analysis in init() to determine whether it is correct to apply : // the optimization. this seems to contradict the comment above that says the caller passes a flag to the constructor that indicates whether the optimization can be applied. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Alex Behm has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has uploaded a new patch set (#3). Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. IMPALA-5240: Allow config of number of disk I/O threads per disk type Currently Impala defaults to 8 threads per flash disk and 1 thread per rotational disk. This can be overridden with --num_threads_per_disk, but that sets the thread count independent of disk type. This would allow control of the number of disk I/O threads spawned independently for solid state disks using "--num_io_threads_per_solid_state_disk" and for rotational disks using "--num_io_threads_per_rotational_disk" as startup parameters. Testing: Added backend tests that verify if "num_threads_per_disk", "num_io_threads_per_solid_state_disk" and "num_io_threads_per_rotational_disk" are used to spawn threads appropriately Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c --- M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/thread.cc M be/src/util/thread.h M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java 6 files changed, 78 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/3 -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5240: Allow config of number of disk I/O threads per disk type .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7232/1/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: PS1, Line 637: DiskIoMgr(int num_disks, int threads_per_disk, int min_buffer_size, : int max_buffer_size); : : /// Create DiskIoMgr with default configs. : DiskIoMgr(); : : /// Clean up all threads and resour > You can remove threads_per_disk and instead change the tests to pass the sa Done PS1, Line 637: DiskIoMgr(int num_disks, int threads_per_disk, int min_buffer_size, : int max_buffer_size); : : /// Create DiskIoMgr with default configs. : DiskIoMgr(); : : /// Clean up all threads and resour > You can also change DiskQueue to know how many threads it has, and update D Done -- To view, visit http://gerrit.cloudera.org:8080/7232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout .. IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout A previous commit attempted to fix IMPALA-4868 by waiting longer for a new file to become visible to the Impala code responsible for parsing the admission control config file. The timeout was observed in another gerrit-verify job, so we should increase this timeout further. I don't see any indication that there is anything else wrong with the test, or that any other kind of failure occurred. This change gives the test up to 20 seconds for the new file to become visible and processed by the RequestPoolService. Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c Reviewed-on: http://gerrit.cloudera.org:8080/7118 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Henry Robinson has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > I added a DCHECK that get hits immediately in the old buggy code with test_ Thanks Tim - you're quite right, and I misunderstood what you're saying. I think it's brittle to rely on the fact that move() may or may not result ultimately in a moved-from rvalue, but guess it's useful here with no obvious better implementation. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to c0798a9
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to c0798a9 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to c0798a9
Impala Public Jenkins has submitted this change and it was merged. Change subject: Bump Kudu version to c0798a9 .. Bump Kudu version to c0798a9 Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f Reviewed-on: http://gerrit.cloudera.org:8080/7246 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec
anujphadke has posted comments on this change. Change subject: IMPALA-4863: Correctly account the file type and compression codec .. Patch Set 2: Defaulting filtered to false in the function definition instead of passing false at every function invocation. -- To view, visit http://gerrit.cloudera.org:8080/7245 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-4863: Correctly account the file type and compression codec .. IMPALA-4863: Correctly account the file type and compression codec If a scan range is filtered at runtime the scan node skips reading the range and never figures out the underlying compression codec used to compress the files. In such a scenario we default the compression codec to NONE which can be misleading. This change marks these files as filtered in the scan node profile e.g. - File Formats: PARQUET/SNAPPY:364 PARQUET(Filtered):1460 Change-Id: I797916505f62e568f4159e07099481b8ff571da2 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h 4 files changed, 21 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/2 -- To view, visit http://gerrit.cloudera.org:8080/7245 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has uploaded a new patch set (#6). Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/kudu-scan-node.cc M be/src/exec/scan-node.h M be/src/util/blocking-queue.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 26 files changed, 703 insertions(+), 825 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/6 -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > My understand was that move() didn't actually modify the object, it just pr I added a DCHECK that get hits immediately in the old buggy code with test_cancellation.py on Kudu. I removed RowBatchQueue::AddBatchWithTimeout() to avoid confusion and am now calling BlockingPutWithTimeout() on the queue directly. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3040 addendum: use specific build type timeout for slow builds
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3040 addendum: use specific_build_type_timeout for slow builds .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3040 addendum: use specific build type timeout for slow builds
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3040 addendum: use specific_build_type_timeout for slow builds .. IMPALA-3040 addendum: use specific_build_type_timeout for slow builds IMPALA-3040 was initially fixed to use a timeout with HDFS caching tests, however some test executions against slow-running builds such as ASAN indicate this timeout may not be high enough. Use the specific_build_type_timeout() method to set a much higher timeout for slower builds such as ASAN. This allows us to virtually ignore timeout values on slow builds, but doesn't force us to unconditionally increase the timeout in a release or debug build. Testing: Ran all tests that use get_num_cache_requests() in a loop 100 times each under an ASAN build. All test iterations passed. Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2 Reviewed-on: http://gerrit.cloudera.org:8080/7115 Reviewed-by: Michael Brown Tested-by: Impala Public Jenkins --- M tests/query_test/test_hdfs_caching.py 1 file changed, 3 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: > (1 comment) Thanks Tim. This example makes things clearer. @Henry: The reason I said that using an rvalue ref is discouraged can be seen in Tim's example code. The caller knows exactly what to expect after calling pass_by_value(), i.e. the ownership is predictable. However, after pass_by_rvalue_ref(), the caller does not know who owns the 'p' member anymore, unless through other means; like explicitly checking the unique_ptr value, or getting a bool back that says if it was used or not, etc. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > You could add an output parameter to BlockingPutWithTimeout() (rather than My understand was that move() didn't actually modify the object, it just produced an rvalue ref to the object. Instead the destruction of the source object would happen when the move constructor or assign operator is invoked. This test program seems to confirm this understanding: #include #include using std::cout; using std::move; using std::unique_ptr; void pass_by_rvalue_ref(unique_ptr&& ref, bool do_assign) { if (do_assign) { unique_ptr tmp = move(ref); } } void pass_by_value(unique_ptrref) { } int main() { unique_ptr p(new int); cout << "Initial: " << p.get() << "\n"; pass_by_rvalue_ref(move(p), false); cout << "After pass by rvalue ref no assign: " << p.get() << "\n"; pass_by_value(move(p)); cout << "After pass by value: " << p.get() << "\n"; p.reset(new int); cout << "Initial: " << p.get() << "\n"; pass_by_rvalue_ref(move(p), true); cout << "After pass by rvalue ref assign: " << p.get() << "\n"; return 0; } The output is: tarmstrong@tarmstrong-box:~$ g++ -std=c++14 uniq.cc -o uniq && ./uniq Initial: 0x19d2c20 After pass by rvalue ref no assign: 0x19d2c20 After pass by value: 0 Initial: 0x19d2c20 After pass by rvalue ref assign: 0 -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > You could add an output parameter to BlockingPutWithTimeout() (rather than unique_ptr is already a move only class, which means it's limited to using move semantics and copying isn't allowed, something that rvalue refs try to provide for non-move-only classes/data types. That means that moving to an rvalue ref of a move only class is redundant, save for one subtle difference noted below. So the only difference between taking: func(unique_ptr<> arg) and func(unique_ptr<>&& arg) is that in the first case, the caller knows that after calling func, they're not supposed to use the unique_ptr member that was passed to arg. In the second case, the value of the unique_ptr member passed is dependent on what func() does with arg. In the above code, we're relying on this behavior to conditionally relinquish or retain the ownership of row_batch. Unless I got something wrong? -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Henry Robinson has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > Just passing through. You could add an output parameter to BlockingPutWithTimeout() (rather than change the return type). Or the row batch queue should be of shared_ptr<>. I don't think we should rely on when move() is called in the blocking queue (and I'm not sure that analysis is quite right, as you would need to move() the row batch into AddBatchWithTimeout() to make it an rvalue-ref; i.e. before any timeout). Sailesh - Where did you see discouraging using unique_ptr<> as an rvalue-ref? The idea of rvalues is that you move() into them, which means you are explicitly relinquishing ownership of the moved-from object. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Hello Bharath Vissapragada, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7248 to look at the new patch set (#2). Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. IMPALA-5549: Remove deprecated fields from CatalogService API Remove from TCatalogUpdateResult the deprecated fields that are no longer used and modify the catalog to always populate the 'updated_catalog_objects' and 'removed_catalog_object' fields with the updated/removed catalog objects. Testing: Run core tests. Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 --- M be/src/service/impala-server.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 22 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/7248/2 -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7248/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 2916: } > May be add a Preconditions check on updatedPrivs.size() > 1 just so that it Good point. Done -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 1: Code-Review+1 (1 comment) Ok, Thanks for clarifying. http://gerrit.cloudera.org:8080/#/c/7248/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 2916: } May be add a Preconditions check on updatedPrivs.size() > 1 just so that it doesn't show up as an NPE in some edge case? (or) Probably surround the whole block with if (!updatedPrivs.isEmpty()) -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 179: if (materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) { > I think BlockingPutWithTimeout() actually already supports this fine since Just passing through. I agree with Tim's approach, however, passing a unique_ptr as an rvalue-ref is usually discouraged since the lifetime of the member from the caller side('row_batch' here) completely depends on the implementation of the function it's being passed to, which may not be evident to the caller. That being said, it's that ambiguity that helps us solve this problem and I don't see an easier way to do it, but leaving a comment about this would be good since this is so subtle, and would help from a readability perspective. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 1: > IIRC, this was introduced because the permanent UDF changes broke > some clients (ex: BDR) relying on the old API that returned a > single updated/removed object. Is it not a problem anymore? Just > want to be sure that we don't break them again. Since 5.12, BDR stopped using our catalog API. Hence, there is no need to keep these deprecated logic anymore. -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. Patch Set 1: IIRC, this was introduced because the permanent UDF changes broke some clients (ex: BDR) relying on the old API that returned a single updated/removed object. Is it not a problem anymore? Just want to be sure that we don't break them again. -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. Patch Set 4: Sorry that was a link to the wrong commit, this is the branch with the correct one https://github.com/timarmstrong/incubator-impala/commits/fix-table-writers -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/7248 Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API .. IMPALA-5549: Remove deprecated fields from CatalogService API Remove from TCatalogUpdateResult the deprecated fields that are no longer used and modify the catalog to always populate the 'updated_catalog_objects' and 'removed_catalog_object' fields with the updated/removed catalog objects. Testing: Run core tests. Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 --- M be/src/service/impala-server.cc M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 3 files changed, 28 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/7248/1 -- To view, visit http://gerrit.cloudera.org:8080/7248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7226/4/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 327: Status status = FinalizeCurrentPage(); This doesn't seem right - why return a Status if we're going to assume that it's always ok. I actually had a branch where I was going through and fixing all the dropped statuses found by GCC7's [[nodiscard]]. My fixes for propagating the table writer status are here: https://github.com/timarmstrong/incubator-impala/commit/6a9cb5a1d79c52001597b58eb073ff5b70a239f4 I already ran a build with these fixes. Maybe we can just fold that into here. -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5540: Revert Sentry version back to 5.13 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/774/ -- To view, visit http://gerrit.cloudera.org:8080/7247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13
Henry Robinson has posted comments on this change. Change subject: IMPALA-5540: Revert Sentry version back to 5.13 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/7247 Change subject: IMPALA-5540: Revert Sentry version back to 5.13 .. IMPALA-5540: Revert Sentry version back to 5.13 Sentry has now fixed the problem on their end, so we can return to using 5.13. Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 --- M bin/impala-config.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/7247/1 -- To view, visit http://gerrit.cloudera.org:8080/7247 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Henry Robinson has posted comments on this change. Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. Patch Set 4: I fixed the placement of WARN_UNUSED_RESULT in the previous patchset, and also changed a method signature in the parquet writer so that the status of a compression operation could be checked. Please take another quick look at hdfs-parquet-table-writer.cc! -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization
Hello Impala Public Jenkins, Michael Ho, Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7226 to look at the new patch set (#4). Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization .. IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization Change allocation pattern for Codec objects in RowBatch to be stack-allocated. Make c'tors and Init() methods of codec implementations publicly visible in order to do so. Change signature of BaseColumnWriter::FinalizeCurrentPage() so that result of ProcessBlock32() can be properly checked. Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/experiments/compression-test.cc M be/src/runtime/row-batch.cc M be/src/util/codec.h M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/decompress.h 7 files changed, 119 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7226/4 -- To view, visit http://gerrit.cloudera.org:8080/7226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Dan Hecht has posted comments on this change. Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4866: Hash join node does not apply limits correctly .. Patch Set 4: Actually let's wait until https://gerrit.cloudera.org/#/c/7180 goes in then rebase on top of that. They should be independent but they're touching some of the same code so best to make sure they don't interfere with each other. -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/773/ -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/7180/6/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 238: || JoinOp == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN) { > the old formatting seemed more readable, but up to you. Done http://gerrit.cloudera.org:8080/#/c/7180/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 609: if (!output_build_partitions_.empty()) continue; > the control flow in this loop is still pretty hard to read, but not for thi Agreed. I think it really needs to be completely restructured. -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7180 to look at the new patch set (#7). Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM .. IMPALA-5497: spilling hash joins that output build rows hit OOM The bug is that the join tried to bring the next spilled partition into memory while still holding onto memory from the current partition. The fix is to return earlier if the output batch is at capacity so that resources are flushed. Also reduce some of the redundancy in the loop that drives the spilling logic and catch some dropped statuses.. Testing: The failure was originally reproduced by my IMPALA-4703 patch. I was able to cause a query failure with the current code by reducing the memory limit for an existing query. Before it failed with up to 12MB of memory. Now it succeeds with 8MB or less. Ran exhaustive build. Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test 6 files changed, 85 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7180/7 -- To view, visit http://gerrit.cloudera.org:8080/7180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke