[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Lars Volker has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Alex Behm has uploaded a new patch set (#3). Change subject: Add functional tests for compute stats with mt_dop > 0. .. Add functional tests for compute stats with mt_dop > 0. Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a --- A testdata/workloads/functional-query/queries/QueryTest/mt-dop-compute-stats.test M tests/query_test/test_mt_dop.py 2 files changed, 21 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4879/3 -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Alex Behm has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4879/1/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS1, Line 52: location > Maybe add a comment "Create a second table pointing to the same data files" Done http://gerrit.cloudera.org:8080/#/c/4879/2/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS2, Line 50: tion(" > format Done -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Lars Volker has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/4879/1/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS1, Line 51: execute_query_using_client > That version doesn't switch the DB and extracting the DB manually from the Cool, learned something. PS1, Line 52: location > I want to just create "clone" of an existing table that has the same format Maybe add a comment "Create a second table pointing to the same data files"? http://gerrit.cloudera.org:8080/#/c/4879/2/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS2, Line 50: fotmat format -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Alex Behm has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4879/1/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS1, Line 51: execute_query_using_client > I think self.execute_query does the same. That version doesn't switch the DB and extracting the DB manually from the vector is super painful. Added comment. PS1, Line 52: location > Why is it necessary to change the location here? I'm mostly curious because I want to just create "clone" of an existing table that has the same format and points to the same files. Create table like does not copy the location, so I need to specify it. Alternatives like CTAS don't work because it's not easy to specify the format in the CTAS, and we cannot insert into all formats. Line 54: new_vector = deepcopy(vector) > can you add a comment why this copy step is needed? For this test it's actually not needed because in every test we set the mt_dop exec option again. I was worried about modifying the vector in place and polluting the state of subsequent tests, but all tests use and set mt_dop anyway. -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Alex Behm has uploaded a new patch set (#2). Change subject: Add functional tests for compute stats with mt_dop > 0. .. Add functional tests for compute stats with mt_dop > 0. Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a --- A testdata/workloads/functional-query/queries/QueryTest/mt-dop-compute-stats.test M tests/query_test/test_mt_dop.py 2 files changed, 20 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4879/2 -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4377: Add detailed error log when a UdfExecutorTest fails.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4881 Change subject: IMPALA-4377: Add detailed error log when a UdfExecutorTest fails. .. IMPALA-4377: Add detailed error log when a UdfExecutorTest fails. This is not a fix for IMPALA-4377 but should help with diagnosing the problem if we ever hit the failure again. Change-Id: Id94130715e4f342e4dd2f6cd137ac1eb7b1ecf2d --- M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java 1 file changed, 106 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/4881/1 -- To view, visit http://gerrit.cloudera.org:8080/4881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id94130715e4f342e4dd2f6cd137ac1eb7b1ecf2d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Jim Apple has posted comments on this change. Change subject: IMPALA-3676: Use clang as a static analysis tool .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/experiments/bit-stream-utils.8byte.inline.h File be/src/experiments/bit-stream-utils.8byte.inline.h: Line 100: if (num_bits - bit_offset_ < size) { > the dcheck at line 92 contradicts the need for this if-stmt. is the checke undone http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/util/aligned-new.h File be/src/util/aligned-new.h: Line 50: }; > given that we only really use this for cache alignment, how did you test it I ran core tests. I did not add any unit tests. http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 34: class ThreadPool : public CacheLineAligned { > was this aligned before? if not, why are we aligning it now? Clang complained that it should have been aligned to meet the invariant we asked for on BlockingQueue -- To view, visit http://gerrit.cloudera.org:8080/4758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4758 to look at the new patch set (#10). Change subject: IMPALA-3676: Use clang as a static analysis tool .. IMPALA-3676: Use clang as a static analysis tool This patch adds a script to run clang-tidy over the whole code base. It is a first step towards running clang-tidy over patches as a tool to help users spot bugs before code review. Because of the number of clang-tidy checks, this patch only addresses some of them. In particular, only checks starting with 'clang' are considered. Many of them which are flaky or not part of our style are excluded from the analysis. This patch also exlcudes some checks which are part of our current style but which would be too laborious to fix over the entire codebase, like using nullptr rather than NULL. This patch also fixes a number of small bugs found by clang-tidy. Finally, this patch adds the class AlignedNew, the purpose of which is to provide correct alignment on heap-allocated data. The global new operator only guarantees 16-byte alignment. A class that includes a member variable that must be aligned on a k-byte boundary for k>16 can inherit from AlignedNew to ensure correct alignment on the heap, quieting clang's -Wover-aligned warning. (Static and stack allocation are required by the standard to respect the alignment of the type and its member variables, so no extra code is needed for allocation in those places.) Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 --- A .clang-tidy M CMakeLists.txt M be/CMakeLists.txt M be/src/benchmarks/bloom-filter-benchmark.cc M be/src/benchmarks/in-predicate-benchmark.cc M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/codegen/codegen-anyval.cc M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/common/compiler-util.h M be/src/common/init.cc M be/src/common/logging.h M be/src/common/status.h M be/src/exec/delimited-text-parser.cc M be/src/exec/delimited-text-parser.h M be/src/exec/exec-node.h M be/src/exec/external-data-source-executor.h M be/src/exec/hash-table-test.cc M be/src/exec/hbase-scan-node.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-table-writer.h M be/src/exec/parquet-column-readers.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scanner-context.inline.h M be/src/exec/write-stream.h M be/src/experiments/bit-stream-utils.8byte.inline.h M be/src/experiments/tuple-splitter-test.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/compound-predicates.h M be/src/exprs/expr-test.cc M be/src/exprs/slot-ref.cc M be/src/exprs/string-functions-ir.cc M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/authentication.cc M be/src/rpc/authentication.h M be/src/rpc/thrift-server.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/exec-env.h M be/src/runtime/free-pool-test.cc M be/src/runtime/hbase-table.cc M be/src/runtime/hdfs-fs-cache.h M be/src/runtime/multi-precision-test.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.h M be/src/runtime/scoped-buffer.h M be/src/runtime/string-buffer.h M be/src/runtime/timestamp-parse-util.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/simple-scheduler-test.cc M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-exec-state.cc M be/src/service/query-exec-state.h M be/src/service/query-result-set.cc M be/src/statestore/failure-detector.h M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udas.h M be/src/transport/TSasl.h M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.h M be/src/udf/uda-test-harness.h M be/src/udf/uda-test.cc M be/src/udf/udf-ir.cc M be/src/udf/udf.cc M be/src/udf/udf.h M be/src/udf_samples/uda-sample.cc A be/src/util/aligned-new.h M be/src/util/benchmark-test.cc M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util-test.cc M be/src/util/blocking-queue-test.cc M be/src/util/blocking-queue.h M be/src/util/bloom-filter-test.cc M b
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule { where is this being applied? http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' > I'm not really sure how this patch handles a bushy predicate. Do we flatten i agree, i think we need normalization as a separate transformation step. this will be a prerequisite for other transformations as well, and having to bake normalization into every single transformation rule doesn't make sense. Line 3393: p_brand = 'Brand#23' additional idea for transformation rule: derive additional, non-essential disjunctive scan predicates if the selectivity of the base predicates is sufficiently small (for '=', say). in the example: brand = 12 or brand = 23 or brand = 34 as an extra scan predicate for 'parts', if 'brand' has a sufficiently high cardinality. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Dan Hecht has posted comments on this change. Change subject: IMPALA-4387: validate decimal type in Avro file schema .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3676: Use clang as a static analysis tool
Dan Hecht has posted comments on this change. Change subject: IMPALA-3676: Use clang as a static analysis tool .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: Line 933: builder->CreateCondBr(ret_val, end_field_block, bail_out); this still is buggy. but we have IMPALA-4061 to track that, so okay. http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/experiments/bit-stream-utils.8byte.inline.h File be/src/experiments/bit-stream-utils.8byte.inline.h: Line 100: if (num_bits - bit_offset_ < size) { the dcheck at line 92 contradicts the need for this if-stmt. is the checker just getting confused, or is there really a path where we can have num_bits > size? if the checker is confused, can we just disable it here? adding this if-stmt makes it unclear whether num_bits <= size is really an invariant or not. http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/util/aligned-new.h File be/src/util/aligned-new.h: Line 50: }; given that we only really use this for cache alignment, how did you test it? http://gerrit.cloudera.org:8080/#/c/4758/9/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 34: class ThreadPool : public CacheLineAligned { was this aligned before? if not, why are we aligning it now? -- To view, visit http://gerrit.cloudera.org:8080/4758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Lars Volker has posted comments on this change. Change subject: IMPALA-4387: validate decimal type in Avro file schema .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4387: validate decimal type in Avro file schema .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/4876/2//COMMIT_MSG Commit Message: PS2, Line 16: scans > nit: wrap at 80 chars. Done http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.cc File be/src/util/avro-util.cc: Line 109: case AVRO_BYTES: > maybe add a // fallthrough comment at the end of the line? It doesn't seem necessary when the case labels are adjacent like this, we don't do it in other parts of the codebase. Line 139: return Status::OK(); > Wouldn't it be better to return a non-OK status in a release build? If not, Good point, this value comes from the Avro library so we don't have any guarantee it is a supported type. Fixed it. http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.h File be/src/util/avro-util.h: PS2, Line 84: name > column_name? Done -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-4387: validate decimal type in Avro file schema .. IMPALA-4387: validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc --- M be/src/exec/hdfs-avro-scanner.cc M be/src/runtime/decimal-test.cc M be/src/runtime/types.h M be/src/util/avro-util.cc M be/src/util/avro-util.h M common/thrift/generate_error_codes.py M testdata/bad_avro_snap/README A testdata/bad_avro_snap/invalid_decimal_schema.avro M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 11 files changed, 104 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/3 -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: Code-Review+1 Change looks good to me, I'm not convinced this was the problem we hit in IMPALA-4223 though, because it would not have made it this far when there was an error seeking in Open(). I think there's some protection against opening the wrong version of a HDFS file because the mtime of the cached file is checked (this check is definitely a good idea though) -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Lars Volker has posted comments on this change. Change subject: IMPALA-4387: validate decimal type in Avro file schema .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/4876/2//COMMIT_MSG Commit Message: PS2, Line 16: scans nit: wrap at 80 chars. http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.cc File be/src/util/avro-util.cc: Line 109: case AVRO_BYTES: maybe add a // fallthrough comment at the end of the line? Line 139: return Status::OK(); Wouldn't it be better to return a non-OK status in a release build? If not, can you add a comment explaining why / what the consequences are? http://gerrit.cloudera.org:8080/#/c/4876/2/be/src/util/avro-util.h File be/src/util/avro-util.h: PS2, Line 84: name column_name? -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: Yes
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Lars Volker has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4879/1/tests/query_test/test_mt_dop.py File tests/query_test/test_mt_dop.py: PS1, Line 51: execute_query_using_client I think self.execute_query does the same. PS1, Line 52: location Why is it necessary to change the location here? I'm mostly curious because I did something similar in a different change and didn't specify a location, so the data files ended up in the unique_database's directory. Line 54: new_vector = deepcopy(vector) can you add a comment why this copy step is needed? -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 5: Code-Review+2 +2 for backend, +1 for frontend. -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Lars Volker has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: (1 comment) > (1 comment) > > > (1 comment) > > > > > (1 comment) > > > > > > Can we test this by having the test load metadata and then > > truncate > > > a cached file? > > > > We can try to do this in a custom cluster test. It needs to > follow > > the steps outlined here: > > https://issues.cloudera.org/browse/IMPALA-4223?focusedCommentId=212776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-212776 > > > > They require changes to the system limits to allow for larger > > cached files, and to the data nodes to increase the file cache as > > well, so they might be rather disruptive. Once these are changed > we > > can write a custom cluster test to download, truncate, and upload > > files and run queries over them, checking that the correct log > > messages appear. > > > > Should we break this out into several Jiras / changes? The limits > > should be change in impala-setup, too. The datanode settings > change > > will be required to integrate this into fuzz testing, too. > > > > I will also have a look at the scanner fuzz test and see if it is > > easy to inject these error there. > > Yes, if it's a good amount of work to set up that kind of test, > let's save it for another step. It's probably better to spend time > with getting the fuzzer working rather than this one particular > test case. I Opened IMPALA-4394 and IMPALA-4395 to track both adding tests for this change and adding Hdfs caching to the fuzz tests. http://gerrit.cloudera.org:8080/#/c/4828/1/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: PS1, Line 448: disk > Okay. I'm fine either leaving this one as "disk" or changing both. Thanks. I'll leave it to keep any tooling that searches for this working. -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 11: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/4371/10/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: Line 209: SummaryStatsCounter(TUnit::type unit, int32_t total_num_values, > I'm not sure what this comment means -- this is a constructor, so how can t Oops, that was meant for SetStats(). I put that here by mistake. I've removed this line here now. -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Hello Henry Robinson, Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4371 to look at the new patch set (#11). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 303 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/11 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present
Henry Robinson has posted comments on this change. Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/4842/3/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS3, Line 41: spend on trying to establish connection to HMS the " : "first time on startup "wait to establish an initial connection to the HMS" PS3, Line 75: init_first init_first seems redundant, here and everywhere else. How about "initial_hms_cnxn_timeout_s" http://gerrit.cloudera.org:8080/#/c/4842/3/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: Line 67: protected final MetaStoreClientPool metaStoreClientPool_ = new MetaStoreClientPool(0, 0); long line PS3, Line 103: public void initMetaStoreClientPool(long initFirstClientTimeoutSeconds) { : metaStoreClientPool_.addClients(META_STORE_CLIENT_POOL_SIZE, : initFirstClientTimeoutSeconds); : } Why not call this in the c'tor any more? (if timeout is < 0, add no clients. Or maybe pass META_STORE_CLIENT_POOL_SIZE as a parameter to the c'tor). http://gerrit.cloudera.org:8080/#/c/4842/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS3, Line 163: super don't think you need super., do you? http://gerrit.cloudera.org:8080/#/c/4842/3/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java File fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java: PS3, Line 156: Add 'numClients' to the client pool. :* 'initFirstClientTimeoutSeconds' specifies the time (in seconds) spent on trying to :* initialize the first MetaStore client before giving up. :*/ it might be clearer to call this, initially, as addClients(1, 120), then call addClients(9, 10) (or whatever the second timeout should be). The current API is a little confusing. http://gerrit.cloudera.org:8080/#/c/4842/3/tests/experiments/test_catalog_hms_failures.py File tests/experiments/test_catalog_hms_failures.py: PS3, Line 61: metadatra metadata PS3, Line 109: statestored.service.wait_for_live_subscriber if this fails, I think the test can leave the HMS not running. That will affect other tests. -- To view, visit http://gerrit.cloudera.org:8080/4842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83c70f939429e1d0d20284a1307f3ee1278ae047 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Dan Hecht has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 10: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4371/10/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: Line 209: // This replaces the existing counter with the new values that are passed as arguments. I'm not sure what this comment means -- this is a constructor, so how can the counter be existing already? -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Dan Hecht has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4048: Misc. improvements to /sessions
Henry Robinson has uploaded a new patch set (#2). Change subject: IMPALA-4048: Misc. improvements to /sessions .. IMPALA-4048: Misc. improvements to /sessions * Make table searchable and sortable * Fix 'last accessed time' being printed in UTC * Made table contents more compact so that it mostly fits on screen * Clarify summary text re: active and inactive sessions * Include fix in mustache-cpp required to print 64-bit integers correctly (see https://github.com/henryr/cpp-mustache/commit/29768bf0e84f5a1e95e006fc64996d375499dbda) Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Testing: Visual inspection and manual sorting, searching etc. --- M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/thirdparty/mustache/mustache.cc M www/sessions.tmpl 6 files changed, 79 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/4880/2 -- To view, visit http://gerrit.cloudera.org:8080/4880 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4048: Misc. improvements to /sessions
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4880 Change subject: IMPALA-4048: Misc. improvements to /sessions .. IMPALA-4048: Misc. improvements to /sessions * Make table searchable and sortable * Fix 'last accessed time' being printed in UTC * Made table contents more compact so that it mostly fits on screen * Clarify summary text re: active and inactive sessions * Include fix in mustache-cpp required to print 64-bit integers correctly (see https://github.com/henryr/cpp-mustache/commit/29768bf0e84f5a1e95e006fc64996d375499dbda) Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Testing: Visual inspection and manual sorting, searching etc. --- M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/thirdparty/mustache/mustache.cc M www/sessions.tmpl 6 files changed, 74 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/4880/1 -- To view, visit http://gerrit.cloudera.org:8080/4880 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) > The type attribute would definitely make it easier to apply, I'm ok with ho I don't think we need to hold off. Let's just decide on the syntax and then revisit once we can apply to the type. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) > Good point! The type attribute would definitely make it easier to apply, I'm ok with holding off until we're in a position to do this, or only using it on key APIs. It doesn't look like GCC supports it as a type attribute until the unreleased version 7. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4862/3/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 183: Tuple* kudu_tuple = reinterpret_cast(const_cast(krow.cell(0))); > Logically it's not necessary, but getting rid of it leads to having to also Ok, no problem then. -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) Follow Apache Project Branding Requirements
Henry Robinson has posted comments on this change. Change subject: Follow Apache Project Branding Requirements .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/4683/2/community.html File community.html: Line 147: Maybe lose some of this whitespace? http://gerrit.cloudera.org:8080/#/c/4683/2/downloads.html File downloads.html: PS2, Line 161: navbar Will trust you that this looks correct and is semantically reasonable. -- To view, visit http://gerrit.cloudera.org:8080/4683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I263790d7c9aba7bfc57de7c16a367fe5204c49ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-Owner: Jim Apple Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4387: validate decimal type in Avro file schema
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4387: validate decimal type in Avro file schema .. IMPALA-4387: validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc --- M be/src/exec/hdfs-avro-scanner.cc M be/src/runtime/decimal-test.cc M be/src/runtime/types.h M be/src/util/avro-util.cc M be/src/util/avro-util.h M common/thrift/generate_error_codes.py M testdata/bad_avro_snap/README A testdata/bad_avro_snap/invalid_decimal_schema.avro M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 11 files changed, 102 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/2 -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4862 to look at the new patch set (#5). Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. Implements additional changes to make the memory layout of Kudu rows identical to Impala tuples. In particular, Kudu rows allocate a null bit even for non-nullable columns, and Impala now does the same for Kudu scan tuples. This change exploits the now-identical Kudu and Impala tuple layouts to avoid the expensive translation. Perf: Mostafa reported a 50% efficiency gain on full table scans. Testing: A private core/hdfs run passed. TODO: 1) Test cases with nullable/nonnullable non-PK slots. 2) Specify mem layout to client (depends on KUDU-1694) 3) Avoid mem copies (depends on KUDU-1695) Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 5 files changed, 67 insertions(+), 202 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/4862/5 -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Alex Behm has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4862/4/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS4, Line 56: : // Maximum size of string values returned from Kudu. : static constexpr size_t MAX_KUDU_STRING_SIZE = 8 * (1 << 20); > not needed Oops. Forgot to remove. Done. -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4862/4/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS4, Line 56: : // Maximum size of string values returned from Kudu. : static constexpr size_t MAX_KUDU_STRING_SIZE = 8 * (1 << 20); not needed -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Jim Apple has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) > I guess we'd have to wait for C++17, where in the link Tim sent (http://en. Good point! I guess we could pass "c++1z" rather than "c++14", but we might also need to upgrade to a newer gcc in the toolchain: https://gcc.gnu.org/projects/cxx-status.html#cxx1z Another idea is to switch to clang for all builds. I tested it, and clang allows this attribute on types: struct[[gnu::warn_unused_result]] Foo { int bar; }; Foo Baz() { return {11}; } int main() { Baz(); const auto qux = Baz(); return qux.bar; } GCC (4.9.2) warns about using the attribute in the wrong place. Clang (3.8.0-p1) warns about the unused return value from the Baz call on the first line of main. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4314: Standardize on MT-related data structures
Henry Robinson has posted comments on this change. Change subject: IMPALA-4314: Standardize on MT-related data structures .. Patch Set 3: Code-Review+1 (4 comments) Looked at .cc / .thrift only. http://gerrit.cloudera.org:8080/#/c/4853/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS3, Line 246: boost remove boost:: PS3, Line 456: runtime Will you file a JIRA to fix this? http://gerrit.cloudera.org:8080/#/c/4853/3/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: Line 116: void Validate() const; Still missing mention of how this signals failure. PS3, Line 201: /// TODO: why are these part of the schedule? The alternative is that there is a pointer to the enclosing QueryExecState. -- To view, visit http://gerrit.cloudera.org:8080/4853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I465d0e15e2cf17cafe4c747d34c8f595d3645151 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) > I would prefer MUST_USE(x) better as an API user, since I feel it gives me I guess we'd have to wait for C++17, where in the link Tim sent (http://en.cppreference.com/w/cpp/language/attributes) it appears to become usable as a type attribute. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Jim Apple has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) > This is going to get pretty noisy if we start tagging all return types like I would prefer MUST_USE(x) better as an API user, since I feel it gives me information about the type like const and volatile do. Unfortunately, it looks like warn_unused_result is a function attribute, not a type attribute: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Dan Hecht has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/4878/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-3200 (buffer pool) IMPALA-2615 http://gerrit.cloudera.org:8080/#/c/4878/2/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: PS2, Line 38: MUST_USE(Status) This is going to get pretty noisy if we start tagging all return types like this. In the future (perhaps after we fix the warnings over time), will it be feasible to mark the Status type itself with this attribute? And until then, an alternate syntax worth considering is to put the attribute after the function decl, like how Kudu does it, e.g.: Status SetFlushMode(FlushMode m) WARN_UNUSED_RESULT; To my eyes that's a bit easier to read since it's not so central to the declaration. But I don't feel too strongly about it if the consensus prefers the current syntax. -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Jim Apple has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 2: Code-Review+1 (2 comments) +1, not +2, because I'm interested in hearing Dan's thoughts on this as well. http://gerrit.cloudera.org:8080/#/c/4878/2//COMMIT_MSG Commit Message: PS2, Line 9: STATUS_RETURN MUST_USE now PS2, Line 10: __attribute__((warn_unused_result)) expands to [[gnu::warn_unused_result]] now -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation
Henry Robinson has submitted this change and it was merged. Change subject: IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation .. IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation This patch fixes two problems: 1. If a fragment instance is cancelled (by the coordinator) concurrently to the coordinator calling GetNext() on its PlanRootSink, GetNext() may access a destroyed PlanRootSink object and crash (IMPALA-4333). 2. If ExecPlanFragment() times out, but is successful, the coordinator will never call CloseConsumer() to release its side of the PlanRootSink, and the fragment instance will never finish (IMPALA-4348). The following changes address this: * Make PlanFragmentExecutor::Cancel() call CloseConsumer() on its root sink. This ensures that, no matter what the reason, the fragment instance will eventually terminate. Otherwise the coordinator may not call CloseConsumer() if it could not access the root instance's FragmentExecState. * Ensure that the PlanRootSink has a lifetime at least as long as the coordinator object by taking a shared_ptr reference to its parent FragmentExecState object. Even if the fragment instance finishes, the object will still be valid until the coordinator releases this reference. In the future, we expect this to be replaced by an explicit reference take / relinquish API. * Ensure the coordinator tries to cancel fragment instances if any attempt was made to start them, not just if the RPC was a success. This deals with the timeout case where the RPC was slow, but successful. This patch relies on a fix for IMPALA-4383, which ensures that fragment instances will always try to send reports, and so will always detect an error and cancel themselves. Testing: * Ran TestRPCTimeout.test_execplanfragment_timeout in a loop for 90 minutes. Patch addresses hangs and failure modes that were observed previously. * Added a manual sleep just before Coordinator::GetNext() calls root_sink_->GetNext(), and cancelled a query manually in that window. Confirmed that query cancels successfully. Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a Reviewed-on: http://gerrit.cloudera.org:8080/4865 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson --- M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/plan-fragment-executor.cc 5 files changed, 65 insertions(+), 35 deletions(-) Approvals: Henry Robinson: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4865 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4383: Ensure plan fragment report thread is always started
Henry Robinson has posted comments on this change. Change subject: IMPALA-4383: Ensure plan fragment report thread is always started .. Patch Set 2: Code-Review+2 Verified+1 Exhaustive build passed, GVO passed in conjunction with https://gerrit.cloudera.org/#/c/4865/ -- To view, visit http://gerrit.cloudera.org:8080/4864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d3cab4cc5245014758e6b70dec7a706a7fa929b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation
Henry Robinson has posted comments on this change. Change subject: IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation .. Patch Set 6: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4865 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4383: Ensure plan fragment report thread is always started
Henry Robinson has submitted this change and it was merged. Change subject: IMPALA-4383: Ensure plan fragment report thread is always started .. IMPALA-4383: Ensure plan fragment report thread is always started PlanFragmentExecutor::report_thread_active_ was set during OpenInternal() after ReportProfile() - run in a different thread - signalled that it was running. Howeer, ReportProfile() reads report_thread_active_ to determine if it should exit its loop. If it runs fast enough, ReportProfile() will never see report_thread_active_ == true. This patch moves setting report_thread_active_ to ReportProfile() - slightly earlier, but visible at almost the same time because it is now correctly protected by report_thread_lock_. Testing: * TestRPCTimeout would hit this in certain configurations. Ran test_execplanfragment_timeout() for 90 minutes with no failures; previously it would fail within the first five attempts repeatedly. Change-Id: I5d3cab4cc5245014758e6b70dec7a706a7fa929b Reviewed-on: http://gerrit.cloudera.org:8080/4864 Reviewed-by: Henry Robinson Tested-by: Henry Robinson --- M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h 2 files changed, 6 insertions(+), 2 deletions(-) Approvals: Henry Robinson: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/4864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5d3cab4cc5245014758e6b70dec7a706a7fa929b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4862 to look at the new patch set (#4). Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. Implements additional changes to make the memory layout of Kudu rows identical to Impala tuples. In particular, Kudu rows allocate a null bit even for non-nullable columns, and Impala now does the same for Kudu scan tuples. This change exploits the now-identical Kudu and Impala tuple layouts to avoid the expensive translation. Perf: Mostafa reported a 50% efficiency gain on full table scans. Testing: A private core/hdfs run passed. TODO: 1) Test cases with nullable/nonnullable non-PK slots. 2) Specify mem layout to client (depends on KUDU-1694) 3) Avoid mem copies (depends on KUDU-1695) Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 5 files changed, 70 insertions(+), 202 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/4862/4 -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Alex Behm has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/4862/3//COMMIT_MSG Commit Message: Line 17: > TODO: Done http://gerrit.cloudera.org:8080/#/c/4862/3/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 183: Tuple* kudu_tuple = reinterpret_cast(const_cast(krow.cell(0))); > Is the const_cast necessary? We shouldn't be modifying the memory. Ok to le Logically it's not necessary, but getting rid of it leads to having to also fix these: * Tuple::DeepCopy() is not a const function * ExecNode::EvalConjuncts() takes a TupleRow* and not a const TupleRow* * ExprContext::Get*Val() take TupleRow* and not a const TupleRow*. These are called from EvalConjuncts(). Line 198: *batch_done = true; > Not your change but I think we should set *batch_done = false up the top of Done PS3, Line 215: 8 * (1 << 20) > Let's make this a named constant just for clarity. Removed this function in response to Matt's comment. http://gerrit.cloudera.org:8080/#/c/4862/3/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h: PS3, Line 79: /// Asserts that all string slots in the given tuple have less than 8MB of data. : /// Kudu should never return values larger than 8MB. : void ValidateVarLenKuduData(const Tuple* tuple) const; > why do we need to bother checking this since it's their limitation not ours removed -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3724: Support Kudu non-covering range partitions
Alex Behm has posted comments on this change. Change subject: IMPALA-3724: Support Kudu non-covering range partitions .. Patch Set 2: (10 comments) Had a pass over the tests. http://gerrit.cloudera.org:8080/#/c/4856/2/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: Line 117:* Builds and returns a range partition bound to be used in the creation of a Kudu remove "to be" to disambiguate ("bound" could be a interpreted as noun or "bound to be" as a verb") also use "range-partition bound" Line 167: TExprNode literal = boundaryVal.getNodes().get(0); check state that boundaryVal is a literal expr http://gerrit.cloudera.org:8080/#/c/4856/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 1632: //TestUtils.assumeKuduIsSupported(); I know why this is here :) Line 2386: ParserError("CREATE TABLE Foo (i int) DISTRIBUTE BY RANGE(i) SPLIT ROWS ((1),(2))"); remove (there many things we don't support) Line 2408: ParserError("CREATE TABLE Foo (a int) DISTRIBUTE BY RANGE (a) ()"); Test that something like >= does not parse. We might consider parsing all comparison ops and giving a nicer error message during analysis. You can think about it. Line 2419: ParserError("CREATE TABLE Foo (a int) DISTRIBUTE BY RANGE (a) " + add tests with PARTITION 1 < VALUE and PARTITION VALUES = 10 (mismatched comparison op and VALUE/VALUES) Line 2561: ParsesOk("CREATE TABLE Foo PRIMARY KEY (a, b) DISTRIBUTE BY HASH (b) INTO 2 " + add one CTAS test with RANGE and some PARTITIONs http://gerrit.cloudera.org:8080/#/c/4856/2/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 91: (partitiion values <= 1, partition 1 < value <= 10, partition 10 < values) stored as kudu; typo: partitiion http://gerrit.cloudera.org:8080/#/c/4856/2/testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test File testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test: Line 68: (partition values <= 10, partition 10 < values <=20, partition 20 < values <= 30, nit: space after second <= Line 127: primary key (id, name)) distribute by hash(id) into 4 buckets, range(id, name) is there a limit on the number of range partitions? -- To view, visit http://gerrit.cloudera.org:8080/4856 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6799c01a37003f0f4c068d911a13e3f060110a06 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4878/1//COMMIT_MSG Commit Message: Line 14: The macro is only used in the buffer pool for now, but we can use it in > I saw a JIRA you filed about this where you found a likely bug. Maybe refer Done http://gerrit.cloudera.org:8080/#/c/4878/1/be/src/common/status.h File be/src/common/status.h: PS1, Line 262: __attribute__((warn_unused_result)) > This can be [[gnu::warn_unused_result]] with the new C++11 syntax Done. Looks like eventually this will be standardised as [[nodiscard]] http://en.cppreference.com/w/cpp/language/attributes Line 262: #define STATUS_RETURN __attribute__((warn_unused_result)) Status > How would you feel about making this a function-like macro, like MUST_USE(T Done -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. IMPALA-3200 (buffer pool): warn if Status is ignored This introduces a STATUS_RETURN macro that expands to __attribute__((warn_unused_result)) Status. It can be used in function declarations in place of Status to issue warnings if a return Status is ignored. E.g. I found a bug IMPALA-4391 by applying it to some other places in the code. The macro is only used in the buffer pool for now, but we can use it in other places in follow-up patches if we agree on the pattern. Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc --- M be/src/bufferpool/buffer-allocator.h M be/src/bufferpool/buffer-pool.h M be/src/common/status.h 3 files changed, 16 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/4878/2 -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4379: Fix and test Kudu table type checking
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4379: Fix and test Kudu table type checking .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4857/2/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test: Line 46: ImpalaRuntimeException: Table 'simple_new' does not represent a Kudu table. Expected storage_handler 'com.cloudera.kudu.hive.KuduStorageHandler' but found 'foo' > Agree, I'm inclined to disallow altering storage handlers altogether. Why n They have weird behavior that I think is bad too. It doesn't allow you to use ALTER TABLE for any non-native tables. For native tables, it allows you to set storage_handler, but that will make the table a non-native table and then it can't be changed or even dropped. hive> alter table i1740_alter set tblproperties ('storage_handler'=''); OK Time taken: 0.132 seconds hive> alter table i1740_alter set tblproperties ('storage_handler'='foo'); FAILED: SemanticException [Error 10134]: ALTER TABLE cannot be used for a non-native table i1740_alter After this, nothing works on this table. I think I'll have to go to Mysql to drop it. -- To view, visit http://gerrit.cloudera.org:8080/4857 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I475273cbbf4110db8d0f78ddf9a56abfc6221e3e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3342: Add thread counters to monitor plan fragment execution .. Patch Set 7: Code-Review+1 (2 comments) LGTM once you make a couple of small changes. http://gerrit.cloudera.org:8080/#/c/4633/7/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: PS7, Line 137: PlanFragmentThread Let's call it "PlanFragmentThreads" to make it clear that there may be multiple threads. http://gerrit.cloudera.org:8080/#/c/4633/7/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 365: /// Total CPU utilization for all threads in this plan fragment. Let's group it with the other timers - above total_storage_wait_timer_ -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4879 Change subject: Add functional tests for compute stats with mt_dop > 0. .. Add functional tests for compute stats with mt_dop > 0. Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a --- A testdata/workloads/functional-query/queries/QueryTest/mt-dop-compute-stats.test M tests/query_test/test_mt_dop.py 2 files changed, 20 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4879/1 -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Tim Armstrong has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 3: Confirmed I was able to build on all of our supported platforms with the change. Will wait until our builds stabilise before merging. -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4372: 'Describe formatted' returns types in upper case
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4372: 'Describe formatted' returns types in upper case .. Patch Set 2: test_describe_formatted did not catch this bug because it was silently failing. The primary issue there was that it wasn't passing a 'transport' param to the jdbc client, and it uses the workload runner infrastructure, which logs errors it encounters but doesn't fail tests (in query_exec_functions.py:run_query_capture_results). To prevent other tests from making the same mistake, I added an assert in JdbcQueryExecConfig that ensures that the transport is set. There was also an incorrect number of parameters being passed to the HiveQueryResult constructor in create_exec_result that I fixed. Finally, it turns out that we handle NULLs differently from Hive. For now, I've left our NULL behavior as is, and just modified the test to allow NULLs to equal empty strings, which also allowed me to remove an xfail from the test. -- To view, visit http://gerrit.cloudera.org:8080/4861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I274b97d4d1247244247fb38a5ca7f4c10bba8d22 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 42: if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr; Isn't this check a little weak in the sense that it only checks the root's Op type to be OR. IIUC, a case like ((a OR b) OR (a AND c)) still passes and returns 'a' as the common conjunct. Is it better to have a check that makes sure the expr is in a proper disjunctive normal form? http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test: Line 3384: p_brand = 'Brand#12' I'm not really sure how this patch handles a bushy predicate. Do we flatten them somewhere and convert it to a DNF before applying the new rule? (May be I misread the code) For example: ((a or (b and c)) or ((a and b) or (d or f))) The tests here and elsewhere seem to be handling simple predicates, so I'm wondering how cases like above are dealt with. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4372: 'Describe formatted' returns types in upper case
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-4372: 'Describe formatted' returns types in upper case .. IMPALA-4372: 'Describe formatted' returns types in upper case A recent change caused 'describe formatted' to display the types in all upper case, but we want 'describe formatted' to match Hive's 'describe' output, which displays the types in lower case. This patch also fixes several problems with test_describe_formatted, which was encountering an error but reporting success. Change-Id: I274b97d4d1247244247fb38a5ca7f4c10bba8d22 --- M fe/src/main/java/org/apache/impala/catalog/Column.java M testdata/workloads/functional-query/queries/QueryTest/avro-schema-changes.test M tests/common/impala_test_suite.py M tests/metadata/test_metadata_query_statements.py M tests/performance/query_exec_functions.py M tests/performance/query_executor.py 6 files changed, 43 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/4861/2 -- To view, visit http://gerrit.cloudera.org:8080/4861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I274b97d4d1247244247fb38a5ca7f4c10bba8d22 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Jim Apple has posted comments on this change. Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/4878/1//COMMIT_MSG Commit Message: Line 14: The macro is only used in the buffer pool for now, but we can use it in I saw a JIRA you filed about this where you found a likely bug. Maybe reference that here? http://gerrit.cloudera.org:8080/#/c/4878/1/be/src/common/status.h File be/src/common/status.h: Line 262: #define STATUS_RETURN __attribute__((warn_unused_result)) Status How would you feel about making this a function-like macro, like MUST_USE(Type) __attribute__((warn_unused_result)) Type? PS1, Line 262: __attribute__((warn_unused_result)) This can be [[gnu::warn_unused_result]] with the new C++11 syntax -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4862/3//COMMIT_MSG Commit Message: Line 17: TODO: 1) Test cases with nullable/nonnullable slots. 2) Specify mem layout to client (depends on KUDU-1694) 3) Avoid mem copies (depends on KUDU-1695) http://gerrit.cloudera.org:8080/#/c/4862/3/be/src/exec/kudu-scanner.h File be/src/exec/kudu-scanner.h: PS3, Line 79: /// Asserts that all string slots in the given tuple have less than 8MB of data. : /// Kudu should never return values larger than 8MB. : void ValidateVarLenKuduData(const Tuple* tuple) const; why do we need to bother checking this since it's their limitation not ours? if they change their buffer sizes in the future then we have to change this as well. -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3552: make incremental stats max serialized size configurable
Yonghyun Hwang has posted comments on this change. Change subject: IMPALA-3552: make incremental stats max serialized size configurable .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4867/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 78: catalog_class_, catalog_ctor_, : load_in_background, num_metadata_loading_threads, sentry_config, : FlagToTLogLevel(FLAGS_v), FlagToTLogLevel(FLAGS_non_impala_java_vlog), : auth_to_local, principal > No for both. Then, let's include it here. :) http://gerrit.cloudera.org:8080/#/c/4867/1/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS1, Line 83: boolean loadInBackground, int numMetadataLoadingThreads, : String sentryServiceConfig, int impalaLogLevel, int otherLogLevel, : boolean allowAuthToLocal, String kerberosPrincipal > Its a good ramp-up task and I think we should do it here. This clean-up has Agreed. However, I think current fix contains enough changes. I will create a new jira ticket and take care of the clean up right after this commit. I want to minimize the scope of changes for one commit. -- To view, visit http://gerrit.cloudera.org:8080/4867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Yonghyun Hwang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: Yonghyun Hwang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input .. Patch Set 4: (17 comments) Thanks for adding the tests, this is looking pretty good - I think it could be refined and polished a bit more but no major concerns here. http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 268: // Pass the row batch to the writer. If new_file is returned true then the current I find this comment a little confusing. Maybe explain more the "why", i.e. that the rows may be written to multiple files. Line 272: do { I feel like this might be clearer if you write it as while(true) and then break out if 'new_file' is false. Line 288: // TODO: Should we adapt hdfs writer to accept start,num pair instead of a vector? My guess is that it would only help a little bit. It's nice to use the same code path for this and dynamic partitioning at least for now. Line 296: PartitionPair* next_partition_pair = NULL; It would be more efficient to keep track of the previous key and comparing directly instead of looking up the partition hash table for every row. It's a little more complicated but I think would help perf. PS4, Line 301: Flush I think it might be clearer to just say "Write rows" instead of "flush". Flush makes me think it's writing to the filesystem, but it's really just pushing the rows into the HdfsTableWriter, which does its own buffering. Line 301: // Flush previous partition I think the comments in here are a little too low level. Here maybe a single comment along the lines of "Done with previous partition - write rows and close." might be more helpful. Line 310: // Collect row index This comment isn't too helpful. PS4, Line 313: Flush "Write rows to" is probably clearer than "flush". PS4, Line 607: WriteRowsOfPartition WriteRowsToPartition? http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: PS4, Line 183: rows Maybe reword to be clearer, took me a while to figure out: "a vector of indices of the rows in the current batch to insert into the partition" Line 209: /// Writes all rows of a partition to the partitions writer and clears the row index This could be clearer that it's only writing the rows with indices in 'partition_pair', it kind-of sounds like it's writing all the rows in 'batch' into the partition. PS4, Line 209: partitions partition's (missing apostrophe). PS4, Line 215: is expected to This could be more declarative - i.e. "must" instead of "is expected to" http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert.py File tests/query_test/test_insert.py: PS4, Line 71: text parquet, right? http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 476: Hmm, I think we could also do with a test that writes some large partitions split across files. E.g. do a CTAS of lineitem partitioned by some low-NDV column. We could also maybe adjust PARQUET_FILE_SIZE to force it to split it into more files. Depending on how long it takes, might make sense to only run it in exhaustive. Line 477: Extra blank line Line 501: assert len(files) == 1 Comment why we only expect one file - it might not be that obvious to people looking at this code later. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java: Line 31: * Extracts common conjuncts from disjunctive CompoundPredicates. Might be worth commenting on how this handles > 2 disjuncts. I.e. you need to apply the rule from the bottom-up. Line 49: if (child1Conjuncts.contains(conjunct)) { This is n^2 so will get very slow if we have long lists of disjuncts, e.g. machine-generated queries. This is fine for small n but I think we should seriously consider switching to HashMaps for child1Conjuncts and commonConjuncts for n > some threshold. The core logic I think is the same if you factor it out into a function that operates on Collection. Line 72: if (!child0Conjuncts.isEmpty()) { At this point both child0Conjuncts and child1Conjuncts are non-empty because we would have returned early otherwise. So I think these branches are unnecessary. PS1, Line 84: newDisjuncts.size() > 1 See above - this should always be true. http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 146: "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " + Slightly tangential, but should we apply De Morgan's first to normalise the disjunct into a conjunct? That would help if we had something like: (!(int_col=5 or tinyint_col > 9) and double_col = 7) or (!(int_col = 5) and !(tinyint_col > 9) and double_col = 8) I guess in general it would be nice to do some basic normalisation of expressions in other cases, e.g. convert !(tinyint_col > 9) to tinyint_col <= 9, convert 9 < tinyint_col to tinyint_col <= 9, etc. I think that would make the common conjunct extraction a bit more robust. You can rewrite the query but avoiding that is the motivation for this in the first place, so it would be nice if it worked in some more simple cases. Line 153: "(int_col < 10 and id between 5 and 6) or " + Does common conjunct extraction work if you have a BETWEEN expression and the equivalent expression after the rewrite? Should work if we apply the extraction after the BETWEEN rewrite, but would be nice to test. -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution
anujphadke has posted comments on this change. Change subject: IMPALA-3342: Add thread counters to monitor plan fragment execution .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/4633/6//COMMIT_MSG Commit Message: Line 9: This change removes the use of total_cpu_timer which incorrectly > Yes Done http://gerrit.cloudera.org:8080/#/c/4633/6/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: Line 138 > Why don't we create the thread counters here like before? This seems much c Done and I can access the private variable here too. http://gerrit.cloudera.org:8080/#/c/4633/6/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 242: /// 'failed_allocation_size' is zero, nothing about the allocation size is logged. > The convention is that the trailing underscore means that the member is pri Done. I start the timer in the runtime-state instead of in the plan-fragment-executor object. -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution
anujphadke has uploaded a new patch set (#7). Change subject: IMPALA-3342: Add thread counters to monitor plan fragment execution .. IMPALA-3342: Add thread counters to monitor plan fragment execution This change removes the use of total_cpu_timer which incorrectly monitors the CPU time. Adding THREAD_COUNTERS to measure the user and sys time in plan fragment execution. This also accounts for the time spent in the hdfs/kudu scanner and in a blocking join. Snippet of a query plan with the newly added PlanFragment THREAD_COUNTERS: Fragment F00: Instance 22405f2d916ba9e6:41f8692c0009 . - PerHostPeakMemUsage: 978.06 MB (1025568440) - PlanFragmentThreadInvoluntaryContextSwitches: 12.85K (12846) - PlanFragmentThreadTotalWallClockTime: 5m25s - PlanFragmentThreadSysTime: 147.673ms - PlanFragmentThreadUserTime: 20s710ms - PlanFragmentThreadVoluntaryContextSwitches: 13.43K (13427) Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 --- M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h 7 files changed, 19 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4633/7 -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-3487: validate decimal type in Avro file schema
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4876 Change subject: IMPALA-3487: validate decimal type in Avro file schema .. IMPALA-3487: validate decimal type in Avro file schema This patch prevents an invalid decimal type in an Avro file schema from crashing Impala. Most invalid Avro schemas are caught by the frontend, but file schemas still need to be validated by the backend. After this patch files with bad schemas are skipped. Testing: This was hit very rarely by the scanner fuzzing. Added a regression test that scans a file with a bad schema. Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc --- M be/src/exec/hdfs-avro-scanner.cc M be/src/runtime/decimal-test.cc M be/src/runtime/types.h M be/src/util/avro-util.cc M be/src/util/avro-util.h M common/thrift/generate_error_codes.py M testdata/bad_avro_snap/README A testdata/bad_avro_snap/invalid_decimal_schema.avro M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 11 files changed, 102 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/4876/1 -- To view, visit http://gerrit.cloudera.org:8080/4876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4878 Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored .. IMPALA-3200 (buffer pool): warn if Status is ignored This introduces a STATUS_RETURN macro that expands to __attribute__((warn_unused_result)) Status. It can be used in function declarations in place of Status to issue warnings if a return Status is ignored. The macro is only used in the buffer pool for now, but we can use it in other places in follow-up patches if we agree on the pattern. Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc --- M be/src/bufferpool/buffer-allocator.h M be/src/bufferpool/buffer-pool.h M be/src/common/status.h 3 files changed, 16 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/4878/1 -- To view, visit http://gerrit.cloudera.org:8080/4878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 3: Code-Review+1 (3 comments) So much simpler. Just a few minor things. http://gerrit.cloudera.org:8080/#/c/4862/3/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 183: Tuple* kudu_tuple = reinterpret_cast(const_cast(krow.cell(0))); Is the const_cast necessary? We shouldn't be modifying the memory. Ok to leave if there's some reason the types make it difficult to avoid. Line 198: *batch_done = true; Not your change but I think we should set *batch_done = false up the top of the function instead of relying on the caller to initialize. That's what we do almost everywhere else in the code. PS3, Line 215: 8 * (1 << 20) Let's make this a named constant just for clarity. -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4877 Change subject: IMPALA-1286: Extract common conjuncts from disjunctions. .. IMPALA-1286: Extract common conjuncts from disjunctions. Adds a new ExprRewriteRule to extract common conjuncts from disjunctions. Examples: (a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d)) (a AND b) OR (a AND b) ==> a AND b (a AND b AND c) OR (c) ==> c Testing: Added a new unit test in ExprRewriteRulesTest. Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 7 files changed, 198 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/1 -- To view, visit http://gerrit.cloudera.org:8080/4877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-4309: Introduce Expr rewrite phase and supporting classes.
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4746 to look at the new patch set (#9). Change subject: IMPALA-4309: Introduce Expr rewrite phase and supporting classes. .. IMPALA-4309: Introduce Expr rewrite phase and supporting classes. Introduces a new phase for rewriting Exprs after analysis and before subquery rewriting. The transformed Exprs replace the original ones in analyzed statements. If Exprs were changed, the whole statement is reset() and re-analyzed, similar to how subqueries are rewritten. If both Exprs and subqueries are rewritten there is only one re-analysis of the changed statement. The following new classes work together to perform transformations: 1. ExprRewriteRule - base class for Expr transformation rules 2. ExprRewriter - drives the transformation of Exprs using a list of ExprRewriteRules Statements that have exprs to be rewritten need to implement a new method rewriteExprs() that accepts an ExprRewriter. As an example, this patch adds a rule for converting BetweenPredicates into their equivalent CompoundPredicates. The BetweenPredicate has been notoriously buggy due to a lack of such a separate rewrite phase and is now cleaned up. Testing: 1. Added a new test for checking that the rewrite framework covers all relevant statements, clauses and can properly handle nested statements and subqueries. 2. Added a new test for ExprRewriteRules and implemented tests for the BetweenPredicate rewrite. 2. There are many existing tests for BetweePredicates and they all exercise the new rewrite rule/phase. 3. Ran a private core/hdfs run and it passed. Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java A fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java A fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java A fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java A fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 30 files changed, 726 insertions(+), 216 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/4746/9 -- To view, visit http://gerrit.cloudera.org:8080/4746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2279dc984bcf7742db4fa3b1aa67283ecbb05e6e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4865 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No