[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4259: build Impala without any test cluster setup. .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 4: Rebase onto Alex's change. -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4023: don't attach buffered tuple streams to batches
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4023: don't attach buffered tuple streams to batches .. Patch Set 8: Code-Review+1 Carry -- To view, visit http://gerrit.cloudera.org:8080/4448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6471422f86ce71e4c6ab277a27651bc2e8ff Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 37 files changed, 744 insertions(+), 484 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/4 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 38 files changed, 748 insertions(+), 486 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/3 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4123: Fast bit unpacking .. Patch Set 8: (11 comments) http://gerrit.cloudera.org:8080/#/c/4494/8//COMMIT_MSG Commit Message: PS8, Line 17: 64 > 32, now? Done PS8, Line 18: ors at 64-bit boundaries > What does it mean to have an or at a k-bit boundary? Reworded to be hopefully clearer. http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: Line 273: #include "common/names.h" > This isn't needed if using namespace std;, yes? I think I prefer keeping this and removing the "using namespace" declarations below. http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bswap-benchmark.cc File be/src/benchmarks/bswap-benchmark.cc: Line 123 > Thanks for fixing this. When this is committed, can you file a bug against Sure, will do. http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/testutil/mem-util.h File be/src/testutil/mem-util.h: PS8, Line 31: posix_memalign > #include stdlib.h. I'd say cstdlib, but posix_memalign doesn't seem to be p Added cstdlib. I think in practice it's reasonable to assume that cstdlib includes the same things as stdlib.h in some form. The glibc one from the toolchain literally has #include in it. Line 45: private: > DISALLOW_COPY_AND_ASSIGN Done http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: Line 39: void UnpackSubset(const uint32_t* in, const uint8_t* packed, int num_in_values, > I think it would aid clarity to add a short description of what the meaning Added descriptions and elaborated a little on what the function is doing. PS8, Line 45: uint32_t > const uint32_t * in Done PS8, Line 72: INFO > Was this just for debugging, or did you mean to leave it in? Didn't mean to leave it in - removed. PS8, Line 84: 8 > Why 8 and not 4? Used CHAR_BIT and added an intermediate variable to make it a bit clearer. http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: PS8, Line 64: //LOG(INFO) << "bit_width " << BIT_WIDTH << "\n" : // << "in_bytes " << in_bytes << " num_values " << num_values << " max_input_values " << max_input_values << "\n" :// << "values_to_read " << values_to_read << " batches_to_read " << batches_to_read << "\n" :// << "remainder_values " << remainder_values; > remove Oops, sorry missed this. -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-4123: Fast bit unpacking .. IMPALA-4123: Fast bit unpacking Adds utility functions for fast unpacking of batches of bit-packed values. These support reading batches of any number of values provided that the start of the batch is aligned to a byte boundary. Callers that want to read smaller batches that don't align to byte boundaries will need to implement their own buffering. The unpacking code uses only portable C++ and no SIMD intrinsics, but is fairly efficient because unpacking a full batch of 32 values compiles down to 32-bit loads, shifts by constants, masks by constants, bitwise ors when a value straddles 32-bit words and stores. Further speedups should be possible using SIMD intrinsics. Testing: Added unit tests for unpacking, exhaustively covering different bitwidths with additional test dimensions (memory alignment, various input sizes, etc). Tested under ASAN to ensure the bit unpacking doesn't read past the end of buffers. Perf: Added microbenchmark that shows on average an 8-9x speedup over the existing BitReader code. Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/bit-packing-benchmark.cc M be/src/benchmarks/bswap-benchmark.cc A be/src/testutil/mem-util.h M be/src/util/CMakeLists.txt A be/src/util/bit-packing-test.cc A be/src/util/bit-packing.h A be/src/util/bit-packing.inline.h M be/src/util/bit-stream-utils.h M be/src/util/bit-stream-utils.inline.h 10 files changed, 883 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/9 -- To view, visit http://gerrit.cloudera.org:8080/4494 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4188: Leopard: support external Docker volumes
David Knupp has posted comments on this change. Change subject: IMPALA-4188: Leopard: support external Docker volumes .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/4678/5//COMMIT_MSG Commit Message: PS5, Line 10: use Would "mount" be an appropriate verb. If so, I think it's more descriptive. PS5, Line 19: need the "need to move the"? PS5, Line 21: rsync Why rsync versus, say, just cp or mv? http://gerrit.cloudera.org:8080/#/c/4678/5/tests/comparison/leopard/README File tests/comparison/leopard/README: PS5, Line 17: Basic Configuration It's a nit, but it might be nicer if this line, and "External Volume Configuration" below, looked a little more like headers. PS5, Line 43: use "mount external Docker volumes that contain the necessary testdata"? PS5, Line 54: path on TARGET_HOST where the : external volume will reside This is just a directory, right? http://gerrit.cloudera.org:8080/#/c/4678/5/tests/comparison/leopard/impala_docker_env.py File tests/comparison/leopard/impala_docker_env.py: Line 298: if os.environ.get('KUDU_IS_SUPPORTED') == 'true': Does it makes sense to reference KUDU-1419 here, or at least the README, to explain why the extra work is being done? Line 308: 'rsync -e "ssh -i {priv_key} -o StrictHostKeyChecking=no ' I'm just curious -- how long does this process take? -- To view, visit http://gerrit.cloudera.org:8080/4678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3002/1473: Cardinality observability cleanup
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-3002/1473: Cardinality observability cleanup .. IMPALA-3002/1473: Cardinality observability cleanup IMPALA-3002: The shell prints an incorrect value for '#Rows' in the exec summary for broadcast nodes due to incorrect logic around whether to use max or agg stats. This patch makes the behavior consistent with the way the be treats exec summaries in summary-util.cc IMPALA-1473: When there is a merging exchange with a limit, we may copy rows into the output batch beyond the limit. In this case, we currently update the output batch's size to reflect the limit, but we also need to update ExecNode::num_rows_returned_ or the exec summary may show that the exchange node returned more rows than it really did. Additionally, PlanFragmentExecutor::GetNext does not update rows_produced_counter_ in some cases, leading the runtime profile to display an incorrect value for 'RowsProduced'. Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa --- M be/src/exec/exchange-node.cc M be/src/runtime/plan-fragment-executor.cc M shell/impala_client.py A tests/query_test/test_observability.py 4 files changed, 45 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4679/2 -- To view, visit http://gerrit.cloudera.org:8080/4679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment
Jim Apple has posted comments on this change. Change subject: IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment .. Patch Set 4: Tim, I added you in case you wanted to talk about how this might work with Ubuntu 16.04 -- To view, visit http://gerrit.cloudera.org:8080/4674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If166a8a286d7559af547da39f6cc09e723f34c7e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4259: build Impala without any test cluster setup. .. Patch Set 7: Code-Review+2 The merge failed because the merge jump runs buildall.sh from a different directory. Calling make without changing directories broke that. -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen() .. Patch Set 3: (8 comments) Looking good, just some small things then I'll +1. http://gerrit.cloudera.org:8080/#/c/4651/1/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 174: void AggregationNode::Codegen(RuntimeState* state) { > I thought about something similar but this has the side effect of putting " Ah, I didn't think about that. This works ok for now then. http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 169: runtime_profile()->AddCodegenMsg(false, "disabled by query option DISABLE_CODEGEN"); Thanks, I think this will be clearer. It's still a little weird that we don't print anything when expr codegen is disabled (since that can make a bit perf difference), but I guess that will be fixed naturally by a follow-up patch. http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hash-join-node.cc File be/src/exec/hash-join-node.cc: Line 172: Function* codegen_process_probe_batch_fn = CodegenProcessProbeBatch(codegen, hash_fn); Long line. http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 622: return Status("Disabled by query option."); Dead code I think. http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 153: Status HdfsScanNodeBase::Prepare(RuntimeState* state) { It doesn't look like we add the codegen disabled message hre. http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 700: return Status("Disabled by query option."); Dead code? http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: Line 562: #define ARG_DECL_LIST(n) \ Thanks for doing this. Can you add a comment just giving an example of what the macros expand to? http://gerrit.cloudera.org:8080/#/c/4651/3/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 104: state.CreateCodegen(), env, JniUtil::internal_exc_class(), result_bytes); Tab? -- To view, visit http://gerrit.cloudera.org:8080/4651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4188: Leopard: support external Docker volumes
Michael Brown has posted comments on this change. Change subject: IMPALA-4188: Leopard: support external Docker volumes .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/4678/4/tests/comparison/leopard/impala_docker_env.py File tests/comparison/leopard/impala_docker_env.py: PS4, Line 23: normpath > Perhaps parentheses around these. It took me a little bit to parse this. Done PS4, Line 55: os.path.sep + join_path('home', DOCKER_USER_NAME, 'Impala', 'testdata', 'cluster') > This is a bit long, and make it hard to read. Maybe it could be saved as DE Done -- To view, visit http://gerrit.cloudera.org:8080/4678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4188: Leopard: support external Docker volumes
Michael Brown has uploaded a new patch set (#5). Change subject: IMPALA-4188: Leopard: support external Docker volumes .. IMPALA-4188: Leopard: support external Docker volumes To be able to run the Random Query Generator with Impala and Kudu, we need to use external Docker volumes as a workaround to KUDU-1419. This patch introduces a series of environment variables a user may tweak in order to help with that purpose. The patch assumes a viable, reasonable Docker container based on a standard Linux distribution like Ubuntu 14. To assist users, I've updated the Leopard README with instructions on the environment variables' meanings. The gist here is that the container is the source of truth, which means to create an external volume, we need the testdata off the container onto the host running Docker Engine. To do that we suggest a strategy using rsync via passwordless SSH key. Testing: I used a Cloudera Docker container that has Impala in /home/dev/Impala. Before, Kudu would fail to start due to KUDU-1419. Now, we load testdata into an external volume, build Impala, run the minicluster including Kudu, and can access the tpch_kudu data. I made flake8 fixes as well. flake8 on this file is now clean. Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 --- M tests/comparison/leopard/README M tests/comparison/leopard/impala_docker_env.py 2 files changed, 198 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/4678/5 -- To view, visit http://gerrit.cloudera.org:8080/4678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
Hello David Knupp, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4685 to look at the new patch set (#6). Change subject: IMPALA-4259: build Impala without any test cluster setup. .. IMPALA-4259: build Impala without any test cluster setup. The main outcome of this change is to avoid making unnecessary modification to the Impala or other source trees when we don't need the test cluster. To achieve that, this refactors the script to make the flow easier to understand and makes it more consistent which build steps are executed in which modes. Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 --- M bin/clean.sh M buildall.sh M ext-data-source/CMakeLists.txt M fe/CMakeLists.txt 4 files changed, 179 insertions(+), 163 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/4685/6 -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4259: build Impala without any test cluster setup. .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/4685/5/buildall.sh File buildall.sh: Line 263: NEED_TEST_CLUSTER=1 > NEED_MINICLUSTER? Done Line 316: pushd "${IMPALA_FE_DIR}" > I'm a little confused by having a "fe" cmake target that we are using in bu Done. Also made the fe/ext-data-source targets use mvn-quiet. -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3718: Add test cancellation tests for Kudu
Alex Behm has posted comments on this change. Change subject: IMPALA-3718: Add test_cancellation tests for Kudu .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4700/1/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: Line 94: pytest.skip() What happens if we don't skip this? Are those edge cases potentially interesting? -- To view, visit http://gerrit.cloudera.org:8080/4700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf3d3853e7075991f6d12f125407ebdbe6a287e2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
Alex Behm has posted comments on this change. Change subject: IMPALA-4259: build Impala without any test cluster setup. .. Patch Set 5: (2 comments) So much better http://gerrit.cloudera.org:8080/#/c/4685/5/buildall.sh File buildall.sh: Line 263: NEED_TEST_CLUSTER=1 NEED_MINICLUSTER? Line 316: pushd "${IMPALA_FE_DIR}" I'm a little confused by having a "fe" cmake target that we are using in build_fe() but not here. Looks like building the fe with make would take care of building its dependencies also, so maybe we can simplify the flow here? -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR](hadoop-next) IMPALA-4277: temporary hack to avoid compile errors for HDFS block location API
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4701 Change subject: IMPALA-4277: temporary hack to avoid compile errors for HDFS block location API .. IMPALA-4277: temporary hack to avoid compile errors for HDFS block location API This is a temporary workaround to get Impala building against HDFS 3.0 that can be undone once IMPALA-4172 is committed. This builds if I put together a directory with the minimum files required to build the backend against HDFS: hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libgplcompression.a hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0 hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libgplcompression.so hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libgplcompression.so.0.0.0 hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libhdfs.so hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libgplcompression.la hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libhdfs.a hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/libgplcompression.so.0 hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/include/hdfs.h Change-Id: Ice765b1be62c0c3d1b99282d8c652dd35d3d766d --- M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java 3 files changed, 26 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/4701/1 -- To view, visit http://gerrit.cloudera.org:8080/4701 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ice765b1be62c0c3d1b99282d8c652dd35d3d766d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: hadoop-next Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] WIP: Use AUTO FLUSH BACKGROUND for Kudu sink
Todd Lipcon has uploaded a new patch set (#2). Change subject: WIP: Use AUTO_FLUSH_BACKGROUND for Kudu sink .. WIP: Use AUTO_FLUSH_BACKGROUND for Kudu sink Change-Id: I665ce82d1caf64cf5bdf4bc0e15edbf9f1dc5ba0 --- M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 7 files changed, 73 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/4670/2 -- To view, visit http://gerrit.cloudera.org:8080/4670 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I665ce82d1caf64cf5bdf4bc0e15edbf9f1dc5ba0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-3718: Add test cancellation tests for Kudu
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/4700 Change subject: IMPALA-3718: Add test_cancellation tests for Kudu .. IMPALA-3718: Add test_cancellation tests for Kudu Additional functional tests for Kudu. Change-Id: Icf3d3853e7075991f6d12f125407ebdbe6a287e2 --- M tests/query_test/test_cancellation.py 1 file changed, 13 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/4700/1 -- To view, visit http://gerrit.cloudera.org:8080/4700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icf3d3853e7075991f6d12f125407ebdbe6a287e2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4287: EE tests fail to run when KUDU IS SUPPORTED�lse
Alex Behm has posted comments on this change. Change subject: IMPALA-4287: EE tests fail to run when KUDU_IS_SUPPORTED=false .. Patch Set 1: Code-Review+2 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib97572b94c54bb36ad1d7184aa45744e8a42ffa4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3943: Address post-merge comments.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3943: Address post-merge comments. .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4696 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: Code-Review+1 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4673 to look at the new patch set (#7). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. There are two motivations for this change: 1. Reduce memory consumption. 2. Pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. These follow-on changes are planned: 1. Planner needs to mark slots non-nullable if they correspond to a non-nullable Kudu column. 2. Update Kudu scan node to copy tuples with memcpy. 3. Kudu client needs to support transferring ownership of the tuple memory (maybe do direct and indirect buffers separately). 4. Update Kudu scan node to use memory transfer instead of copy Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 26 files changed, 402 insertions(+), 312 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/7 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR](hadoop-next) IMPALA-4277: bump Hadoop component versions except for Hadoop itself
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/4698 Change subject: IMPALA-4277: bump Hadoop component versions except for Hadoop itself .. IMPALA-4277: bump Hadoop component versions except for Hadoop itself The Hive version bump required some minor changes to deal with renames and added function arguments. The other components required no changes to build the frontend against their API (I haven't been able to test the JNI backend stuff). IMPALA-4172 is blocking the hadoop version bump. This builds if I manually put hive_metastore.thrift in the right place and set SKIP_TOOLCHAIN_BOOTSTRAP=false. Tests can't run since we don't have the special tarballs available that are used for the test cluster. I manually constructed the Hive build dependency by taking hive_metastore.thrift and putting it at $CDH_COMPONENTS_HOME/hive-2.1.0-cdh6.x-SNAPSHOT/src/metastore/if/hive_metastore.thrift Change-Id: Ia0be6bbe76d929ceaeb2fa9ac4ebba1820c4dab7 --- M bin/impala-config.sh M common/thrift/TCLIService.thrift M common/thrift/cli_service.thrift M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DropDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/analysis/TypeDef.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java 16 files changed, 38 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4698/1 -- To view, visit http://gerrit.cloudera.org:8080/4698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0be6bbe76d929ceaeb2fa9ac4ebba1820c4dab7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: hadoop-next Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG Commit Message: Line 9: The main motivation of this change is to pave the way for full > There are 2 main motivations of this change, no? Done Line 12: that goal. > Can you mention what is left after this? Done -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4287: EE tests fail to run when KUDU IS SUPPORTED�lse
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/4697 Change subject: IMPALA-4287: EE tests fail to run when KUDU_IS_SUPPORTED=false .. IMPALA-4287: EE tests fail to run when KUDU_IS_SUPPORTED=false When Kudu tests were added to the functional-query test dimensions, we needed to make sure the dimension was skipped when KUDU_IS_SUPPORTED=false. Change-Id: Ib97572b94c54bb36ad1d7184aa45744e8a42ffa4 --- M tests/common/test_dimensions.py 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/4697/1 -- To view, visit http://gerrit.cloudera.org:8080/4697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ib97572b94c54bb36ad1d7184aa45744e8a42ffa4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3943: Do not throw scan errors for empty Parquet files.
Alex Behm has posted comments on this change. Change subject: IMPALA-3943: Do not throw scan errors for empty Parquet files. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4693/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 413: if (row_group.num_rows == 0 || file_metadata_.num_rows == 0) continue; > I actually find it really confusing because without context it's almost imp http://gerrit.cloudera.org:8080/4696 -- To view, visit http://gerrit.cloudera.org:8080/4693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I50ac3df6ff24bc5c384ef22e0f804a5132adb62e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3943: Address post-merge comments.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4696 Change subject: IMPALA-3943: Address post-merge comments. .. IMPALA-3943: Address post-merge comments. Adds code comments and issues a warning for Parquet files with num_rows=0 but at least one non-empty row group. Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 --- M be/src/exec/hdfs-parquet-scanner.cc M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test 3 files changed, 29 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/4696/1 -- To view, visit http://gerrit.cloudera.org:8080/4696 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] Impala-3342 Adding thread counters to measure time spent during plan fragment execution
Tim Armstrong has posted comments on this change. Change subject: Impala-3342 Adding thread counters to measure time spent during plan fragment execution .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/4633/2//COMMIT_MSG Commit Message: Line 14: What does the profile look like? Would be helpful to see what the plan fragment profile looks like. http://gerrit.cloudera.org:8080/#/c/4633/2/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS2, Line 360: SCOPED_THREAD_COUNTER_MEASUREMENT(runtime_state_->plan_fragment_counters()); > Is this thread safe? Yep, scanner_thread_counters() is also shared between scanner threads. http://gerrit.cloudera.org:8080/#/c/4633/2/be/src/runtime/plan-fragment-executor.cc File be/src/runtime/plan-fragment-executor.cc: Line 213: ADD_COUNTER(profile(), PER_HOST_PEAK_MEM_COUNTER, TUnit::BYTES); Let's create the new counter here along with the other ones so that the setup happens in the same place. http://gerrit.cloudera.org:8080/#/c/4633/2/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 241: ///Fragment thread counters The comment isn't too helpful. Maybe: /// Total CPU utilization for all threads in this plan fragment. -- To view, visit http://gerrit.cloudera.org:8080/4633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Yonghyun Hwang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4188: Leopard: support external Docker volumes
David Knupp has posted comments on this change. Change subject: IMPALA-4188: Leopard: support external Docker volumes .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/4678/4/tests/comparison/leopard/impala_docker_env.py File tests/comparison/leopard/impala_docker_env.py: PS4, Line 23: normpath Perhaps parentheses around these. It took me a little bit to parse this. (join as join_path, normpath) (Probably would have been easier if I were familiar with normpath.) PS4, Line 55: os.path.sep + join_path('home', DOCKER_USER_NAME, 'Impala', 'testdata', 'cluster') This is a bit long, and make it hard to read. Maybe it could be saved as DEFAULT_TESTDATA_VOLUME_PATH? -- To view, visit http://gerrit.cloudera.org:8080/4678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3644 Make predicate order deterministic
Lars Volker has posted comments on this change. Change subject: IMPALA-3644 Make predicate order deterministic .. Patch Set 4: This change now still passes on Java 7 and fixes all but 3 tests in PlannerTest. I will run the test again on Java 8 and attach the resulting error log to the Jira, so we can continue to investigate. Can we go on and merge this one, so it doesn't rot? -- To view, visit http://gerrit.cloudera.org:8080/4671 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id11010bfeaff368869e6d430eeb4773ddf41faff Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3644 Make predicate order deterministic
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-3644 Make predicate order deterministic .. IMPALA-3644 Make predicate order deterministic This adds a tie-break to make sure that we sort predicates in a deterministic order on Java 7 and 8. This was suggested by Alex in IMPALA-3644. There are still three broken tests, for which I will open subsequent Jiras and link them in IMPALA-3644. Change-Id: Id11010bfeaff368869e6d430eeb4773ddf41faff --- M fe/src/main/java/org/apache/impala/planner/PlanNode.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/lineage.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test 23 files changed, 394 insertions(+), 388 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4671/3 -- To view, visit http://gerrit.cloudera.org:8080/4671 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id11010bfeaff368869e6d430eeb4773ddf41faff Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4270: Gracefully fail unsupported queries with mt dop > 0.
Alex Behm has posted comments on this change. Change subject: IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4677/1/testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test File testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test: Line 5: PLAN > but it's not executable, because the join sink isn't executable. We need the current mode for executing joins in subplans where the build will hopefully never be in a separate fragment. -- To view, visit http://gerrit.cloudera.org:8080/4677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91a60ea7b6e3ae4ee44be856615ddd3cd0af476d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4188: Leopard: support external Docker volumes
Michael Brown has uploaded a new patch set (#4). Change subject: IMPALA-4188: Leopard: support external Docker volumes .. IMPALA-4188: Leopard: support external Docker volumes To be able to run the Random Query Generator with Impala and Kudu, we need to use external Docker volumes as a workaround to KUDU-1419. This patch introduces a series of environment variables a user may tweak in order to help with that purpose. The patch assumes a viable, reasonable Docker container based on a standard Linux distribution like Ubuntu 14. To assist users, I've updated the Leopard README with instructions on the environment variables' meanings. The gist here is that the container is the source of truth, which means to create an external volume, we need the testdata off the container onto the host running Docker Engine. To do that we suggest a strategy using rsync via passwordless SSH key. Testing: I used a Cloudera Docker container that has Impala in /home/dev/Impala. Before, Kudu would fail to start due to KUDU-1419. Now, we load testdata into an external volume, build Impala, run the minicluster including Kudu, and can access the tpch_kudu data. I made flake8 fixes as well. flake8 on this file is now clean. Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 --- M tests/comparison/leopard/README M tests/comparison/leopard/impala_docker_env.py 2 files changed, 192 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/4678/4 -- To view, visit http://gerrit.cloudera.org:8080/4678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael BrownGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Michael Brown has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/4414/5/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS5, Line 73: parser.add_option("--kudu_masters", default="127.0.0.1", Could we import the default value from tests.common instead of hardcoding it? http://gerrit.cloudera.org:8080/#/c/4414/5/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS5, Line 38: @SkipIf.kudu_not_supported This is extremely risky due to bugs in our version of old pytest. See IMPALA-3614 for an example of how this is risky. I think you will have to work around this by skipping in setup_class. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: PS5, Line 1: # Copyright (c) 2016 Cloudera, Inc. All rights reserved. Remove Cloudera copyright and make sure the license text is correct otherwise. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS5, Line 220: Why remove this skip? PS5, Line 221: Is this removal safe? PS5, Line 221: self.expected_exceptions = 2 I can't see where this is used; delete? http://gerrit.cloudera.org:8080/#/c/4414/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS5, Line 87: print("Describe formatted output:") : pprint(table_desc) Use LOG.info() -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4270: Gracefully fail unsupported queries with mt dop > 0.
Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. .. IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. MT_DOP > 0 is only supported for plans without distributed joins or table sinks. Adds validation to fail unsupported queries gracefully in planning. For scans in queries that are executable with MT_DOP > 0 we either use the optimized MT scan node BE implementation (only Parquet), or we use the conventional scan node with num_scanner_threads=1. TODO: Still need to add end-to-end tests. Change-Id: I91a60ea7b6e3ae4ee44be856615ddd3cd0af476d --- M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test 10 files changed, 367 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4677/3 -- To view, visit http://gerrit.cloudera.org:8080/4677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91a60ea7b6e3ae4ee44be856615ddd3cd0af476d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4270: Gracefully fail unsupported queries with mt dop > 0.
Alex Behm has posted comments on this change. Change subject: IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/4677/2/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 272: DCHECK_EQ(state->query_options().num_scanner_threads, 1); > roll into a single dcheck with a compound predicate Done http://gerrit.cloudera.org:8080/#/c/4677/2/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 205: > trailing Done Line 207: // If this is true then the MT_DOP query option must be > 0. > leave todo that it should be removed later Done http://gerrit.cloudera.org:8080/#/c/4677/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 314:* Retrurns a set of file formats being scanned. > spelling Done Line 562: msg.hdfs_scan_node.setUse_mt_scan_nodeIsSet(useMtScanNode_); > ..IsSet? it doesn't look like you're calling the actual setter Weird. That seemed to work also. Fixed. http://gerrit.cloudera.org:8080/#/c/4677/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 206: if (ctx_.getQueryOptions().mt_dop > 0) { > the if is redundant, but you can make it a checkstate at the start of the f Added Preconditions check. http://gerrit.cloudera.org:8080/#/c/4677/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: Line 83: public TQueryOptions getQueryOptions() { return getRootAnalyzer().getQueryOptions(); } > why the change? I added a convenience function Analyzer.getQueryOptions() and thought it would make sense to use it here. http://gerrit.cloudera.org:8080/#/c/4677/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 155:* - MT_DOP > 0 is not supported for plans with distributed joins or table sinks. > it's only supported for QueryStmts. take a look at Frontend.createExecReque My thinking was that from a user's perspective running a statement with mt_dop > 0 should have one of two outcomes: 1. The statement runs fine in multi-threaded mode. 2. The statement returns an error indicating that mt_dop is not supported for that statement. If that's the behavior we want, then it's irrelevant whether we can actually generate parallel plans for inserts/ctas. http://gerrit.cloudera.org:8080/#/c/4677/2/testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test File testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test: Line 51: | tuple-ids=0,1 row-size=8B cardinality=unavailable > what happened here? functional_parquet has no stats. I preferred to write the tests on Parquet because that's the preferred file format, happy to change it back or add more tests? -- To view, visit http://gerrit.cloudera.org:8080/4677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91a60ea7b6e3ae4ee44be856615ddd3cd0af476d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo
Alex Behm has posted comments on this change. Change subject: IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo .. Patch Set 9: Code-Review+2 Thanks Lars! -- To view, visit http://gerrit.cloudera.org:8080/4187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 6: Code-Review+1 (2 comments) Thanks! LGTM, just a few comments for the commit msg. I didn't look at the codegen code changes in detail. http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG Commit Message: Line 9: The main motivation of this change is to pave the way for full There are 2 main motivations of this change, no? 1) More compact memory layout. 2) Zero copy (or at least avoiding conversions) with Kudu scans... (as you already have here) Line 12: that goal. Can you mention what is left after this? My understanding is that it is (correct me if I'm wrong): 1) planner needs to set null flags for nullable columns 2) kudu scan node can now _copy_ mem, i.e. rip out conversion code 2) kudu client needs to support mem xfer 3) kudu scan node updated to use mem xfer instead of copying (KUDU-1695) -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4259: build Impala without any test cluster setup.
David Knupp has posted comments on this change. Change subject: IMPALA-4259: build Impala without any test cluster setup. .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: David Knupp Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment
Jim Apple has posted comments on this change. Change subject: IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment .. Patch Set 4: Code-Review+1 (1 comment) Carry David's +1 http://gerrit.cloudera.org:8080/#/c/4674/3/bin/bootstrap_development.sh File bin/bootstrap_development.sh: PS3, Line 24: # The intended user is a person who wants to start contributing code to Impala. This : # script serves as an executable reference point for how to get started. > I guess I would also add, in some fashion, the following assumptions that t Done -- To view, visit http://gerrit.cloudera.org:8080/4674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If166a8a286d7559af547da39f6cc09e723f34c7e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment
Hello David Knupp, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4674 to look at the new patch set (#4). Change subject: IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment .. IMPALA-3971, IMPALA-3229: Bootstrap an Impala dev environment This script bootstraps an Impala dev environment on Ubuntu 14.04. It is not hermetic -- it changes some config files for the user and for the OS. It is green on Jenkins, and it runs in about 6.5 hours. The intention is to have this script run in a CI tool for post-commit testing, with the hope that this will make it easier for new developers to get a working development environment. Previously, the new developer workflow lived on wiki pages and tended to bit-rot. Still left to do: migrating the install script into the official Impala repo. Change-Id: If166a8a286d7559af547da39f6cc09e723f34c7e --- A bin/bootstrap_development.sh 1 file changed, 80 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4674/4 -- To view, visit http://gerrit.cloudera.org:8080/4674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If166a8a286d7559af547da39f6cc09e723f34c7e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3943: Do not throw scan errors for empty Parquet files.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3943: Do not throw scan errors for empty Parquet files. .. IMPALA-3943: Do not throw scan errors for empty Parquet files. For Parquet files with no row groups but with num_rows=0 in the file footer the Parquet scanner returns an error indicating that the file is invalid. This behavior is a regression from previous Impala versions which used to accept such files. This patch restores the previous behavior and adds tests. Change-Id: I50ac3df6ff24bc5c384ef22e0f804a5132adb62e Reviewed-on: http://gerrit.cloudera.org:8080/4693 Reviewed-by: Alex BehmTested-by: Internal Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M testdata/data/README A testdata/data/zero_rows_one_row_group.parquet A testdata/data/zero_rows_zero_row_groups.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-zero-rows.test M tests/query_test/test_scanners.py 6 files changed, 65 insertions(+), 1 deletion(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I50ac3df6ff24bc5c384ef22e0f804a5132adb62e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-3943: Do not throw scan errors for empty Parquet files.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3943: Do not throw scan errors for empty Parquet files. .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I50ac3df6ff24bc5c384ef22e0f804a5132adb62e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4274: hang in buffered-block-mgr-test
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4274: hang in buffered-block-mgr-test .. IMPALA-4274: hang in buffered-block-mgr-test We started seeing hangs in CreateDestroyMulti() where a thread was recursively acquiring static_block_mgrs_lock_. This is only possible because a shared_ptr is destroyed while holding the lock. The fix is to reset the shared_ptr only after releasing the lock. Testing: I was unable to reproduce the hang locally, but the callstack in the JIRA was a strong enough smoking gun to feel confident that this should fix the hang. Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604 Reviewed-on: http://gerrit.cloudera.org:8080/4690 Reviewed-by: Alex BehmReviewed-by: Marcel Kornacker Tested-by: Internal Jenkins --- M be/src/runtime/buffered-block-mgr.cc 1 file changed, 7 insertions(+), 3 deletions(-) Approvals: Marcel Kornacker: Looks good to me, approved Internal Jenkins: Verified Alex Behm: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/4690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4274: hang in buffered-block-mgr-test
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4274: hang in buffered-block-mgr-test .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] Reduce LLVM module's preparation time by lazily creating the IRFunction::Type to LLVM::Function* mappings.
Michael Ho has posted comments on this change. Change subject: Reduce LLVM module's preparation time by lazily creating the IRFunction::Type to LLVM::Function* mappings. .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/4691/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 206: for (int i = IRFunction::FN_START; i < IRFunction::FN_END; ++i) { > Maybe should be a new function like ValidateFunctionMappings()? IMHO, this seems pretty readable and self-contained without factoring it into a separate function. Will add some comments though. Line 209: if (init_codegen->module_->getFunction(fn_name) == NULL) { > We shouldn't hit this unless we messed up the build or the function names r Or if the in memory copy of the bitcode is corrupted somehow. That said, it can get corrupted any time since this check and GetFunction() will return NULL if the bitcode is corrupted somehow so may be okay to keep it as a DCHECK. PS1, Line 407: StringRef > Should be able to construct the StringRef from the std::string directly wit Done Line 668: StringRef fn_name(FN_MAPPINGS[ir_type].fn_name.c_str()); > This conversion shouldn't be necessary either. Done -- To view, visit http://gerrit.cloudera.org:8080/4691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I61ab9fa8cca5a0909bb716c3c62819da3e3b3041 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Reduce LLVM module's preparation time by lazily creating the IRFunction::Type to LLVM::Function* mappings.
Michael Ho has uploaded a new patch set (#2). Change subject: Reduce LLVM module's preparation time by lazily creating the IRFunction::Type to LLVM::Function* mappings. .. Reduce LLVM module's preparation time by lazily creating the IRFunction::Type to LLVM::Function* mappings. Previously, when creating a LlvmCodeGen object, we run an O(mn) algorithm to map the IRFunction::Type to the actual LLVM::Function object in the module. m is the size of IRFunction::Type enum and n is the total number of functions in the module. This is a waste of time if we only use few functions from the module. This change reduces the preparation time of a simple query from 23ms to 10ms. select count(*) from tpch100_parquet.lineitem where l_orderkey > 20; Change-Id: I61ab9fa8cca5a0909bb716c3c62819da3e3b3041 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen.cc 2 files changed, 175 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/4691/2 -- To view, visit http://gerrit.cloudera.org:8080/4691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I61ab9fa8cca5a0909bb716c3c62819da3e3b3041 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3983/IMPALA-3974: Delete function jar resources after load
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-3983/IMPALA-3974: Delete function jar resources after load .. Patch Set 5: Henry, do you have any further comments on this one? Thanks. -- To view, visit http://gerrit.cloudera.org:8080/4617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f9dedb5b342415380c83e61a72eb497371a8199 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen() .. IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen() This patch is mostly mechanical move of codegen related logic from each exec node's Prepare() to its Codegen() function. After this change, code generation will no longer happen in Prepare(). Instead, it will happen after Prepare() completes in PlanFragmentExecutor. This is an intermediate step towards the final goal of sharing compiled code among fragment instances in multi-threading. As part of the clean up, this change also removes the logic for lazy codegen object creation. In other words, if codegen is enabled, the codegen object will always be created. This simplifies some of the logic in ScalarFnCall::Prepare() and various Codegen() functions by reducing error checking needed. This change also removes the logic added for tackling IMPALA-1755 as it's not needed anymore after the clean up. The clean up also rectifies a not so well documented situation. Previously, even if a user explicitly sets DISABLE_CODEGEN to true, we may still codegen a UDF if it was written in LLVM IR or if it has more than 8 arguments. This patch enforces the query option by failing the query in both cases. To run the query, the user must enable codegen. This change also extends the number of arguments supported in the interpretation path of ScalarFn to 20. Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 --- M be/src/exec/aggregation-node.cc M be/src/exec/aggregation-node.h M be/src/exec/exchange-node.cc M be/src/exec/exchange-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/old-hash-table.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exprs/case-expr.cc M be/src/exprs/case-expr.h M be/src/exprs/compound-predicates.cc M be/src/exprs/compound-predicates.h M be/src/exprs/conditional-functions.cc M be/src/exprs/conditional-functions.h M be/src/exprs/expr.cc M be/src/exprs/expr.h M be/src/exprs/hive-udf-call.cc M be/src/exprs/hive-udf-call.h M be/src/exprs/is-not-empty-predicate.cc M be/src/exprs/is-not-empty-predicate.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/exprs/null-literal.cc M be/src/exprs/null-literal.h M be/src/exprs/scalar-fn-call.cc M be/src/exprs/scalar-fn-call.h M be/src/exprs/slot-ref.cc M be/src/exprs/slot-ref.h M be/src/exprs/tuple-is-null-predicate.cc M be/src/exprs/tuple-is-null-predicate.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorted-run-merger.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/fe-support.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M be/src/util/tuple-row-compare.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/query_test/test_udfs.py 69 files changed, 597 insertions(+), 673 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/4651/3 -- To view, visit http://gerrit.cloudera.org:8080/4651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong