[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1047/ -- To view, visit http://gerrit.cloudera.org:8080/7591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1046/ -- To view, visit http://gerrit.cloudera.org:8080/7591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1046/ -- To view, visit http://gerrit.cloudera.org:8080/7591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3208: max row size option
Tim Armstrong has uploaded a new patch set (#8). Change subject: IMPALA-3208: max_row_size option .. IMPALA-3208: max_row_size option This is a preview because it is missing tests. I have manually tested it and it is behaving it as expected so far. Adds support for a "max_row_size" query option that instructs Impala to reserve enough memory to process rows of the specified size. For spilling operators, the planner reserves enough memory to process rows of this size. The advantage of this compared to simply specifying larger values for min_spillable_buffer_size and default_spillable_buffer_size is that operators may be able to handler larger rows without increasing the size of all their buffers. The default value is 512KB. I picked that number because it doesn't increase minimum reservations *too* much but should be large enough for almost all reasonable workloads. This is implemented in the aggs and joins using the variable page size support added to BufferedTupleStream in an earlier commit. The synopsis is that each stream requires reservation for one default-sized page per read and write iterator, and temporarily requires reservation for a max-sized page when reading or writing larger pages. The max-sized write reservation is released immediately after the row is appended and the max-size read reservation is released after advancing to the next row. The sorter and analytic simply use max-sized buffers for all pages in the stream. Testing: Updated existing planner tests to reflect default max_row_size. Added new planner tests to test the effect of the query option. Added "set" test to check validation of query option. Added end-to-end tests exercising spilling operators with large rows with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY. TODO: the end-to-end tests are very memory hungry and may need some stabilisation to execute in parallel. Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/sorter.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java A fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test A testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/set.test A testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M tests/query_test/test_spilling.py 34 files changed, 1,001 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7629/8 -- To view, visit http://gerrit.cloudera.org:8080/7629 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5412: Fix scan result with partitions on same file .. Patch Set 7: Running tests against the Impala-lzo branch with matching change: https://jenkins.impala.io/job/parallel-all-tests/1281/ -- To view, visit http://gerrit.cloudera.org:8080/7625 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5681: release reservation from blocking operators .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7619/4/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS4, Line 1203: > put a comma here Done -- To view, visit http://gerrit.cloudera.org:8080/7619 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators
Hello Thomas Tauber-Marshall, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7619 to look at the new patch set (#5). Change subject: IMPALA-5681: release reservation from blocking operators .. IMPALA-5681: release reservation from blocking operators When an in-memory blocking aggregation or join is in the GetNext() phase where it is outputting accumulated rows then we expect memory consumption to monotonically decrease because no more rows will be accumulated in memory. This change adds support to release unused reservation and makes use of it for in-memory aggregations and sorts. We don't release memory for operators with spilled data, since they may need the reservation to bring it back into memory. We also don't release memory in subplans, since it will probably be used in a later iteration of the subplan. Testing: Updated spilling test that now requires less memory. Ran stress test binary search on tpch_parquet. No changes, except Q18 now requires 325MB instead of 450MB to execute without spilling. Ran query with two sorts in the same pipeline and watched /memz to confirm that the first node in the pipeline was incrementally releasing memory. Added a regression test based on this experiment. Added a backend test to directly test reservation decreasing. Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f --- M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test A testdata/workloads/tpch/queries/sort-reservation-usage.test M tests/query_test/test_sort.py 14 files changed, 163 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7619/5 -- To view, visit http://gerrit.cloudera.org:8080/7619 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2615: support [[nodiscard]] on Status
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2615: support [[nodiscard]] on Status .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/7253/5/be/CMakeLists.txt File be/CMakeLists.txt: Line 73: if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0) > I find it is a confusing pattern. I don't actually see a reasonable alternative that doesn't involve extra redundant code or a lot of refactoring. The current approach is to build up the argument strings then switch on ${CMAKE_BUILD_TYPE} below to determine whether it's a GCC or Clang build. I added a comment to explain why it works. We can remove the branch and clean it up when we upgrade GCC. Line 74: # -Wno-placement-new: avoid spurious warnings from boost::function. > Are those warnings really spurious? I can't tell immediately from the link. It seems ok - the memory region written is large enough. The function_buffer union is large enough at 8 bytes, it's just the data member that is too small at 1 byte. Line 75: # TODO: remove when we upgrade Boost: https://github.com/boostorg/function/pull/9 > Thanks for looking into that. -faligned-new I think is fine, since it's enabling improved behaviour that is part of the C++17 standard instead of masking an issue. I don't think we should let upgrading Boost block upgrading GCC - I have a feeling that the Boost upgrade could be a lot of work. http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/statestore/statestore.h File be/src/statestore/statestore.h: Line 443: /// the > fix wrap Done http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/testutil/in-process-servers.cc File be/src/testutil/in-process-servers.cc: Line 71: Status status = impala->SetCatalogInitialized(); > const Done http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/parquet-reader.cc File be/src/util/parquet-reader.cc: Line 138: // Returns the number of rows specified by the header. > pre-existing issue, but: can you add a note that this function can exit(1) Done http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: Line 748: MakeScopeExitTrigger([]() { compressor->Close(); }); > Should Close() be added to the destructor (after making it unipotent)? We've been trying to standardise elsewhere on explicit resource release with trivial destructors. The argument is that implicit sequencing of non-trivial teardown steps is very hard to understand. I put up a page in the wiki that documents some of the patterns in the code: https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: Line 268: Status SerializeToArchiveString(std::string* out) const WARN_UNUSED_RESULT; > Makes the case for a Maybe template class that can hold a T or a Status. I don't feel strongly. I like those types in other languages but they don't seem to work as well in C++ where there isn't a convenient language construct to branch on the result and unpack the value. We have to use LLVM's ErrorOr in a couple of places and IMO the code wasn't simpler than using an output argument. It looks like C++17 has std::optional, which is available earlier as std::experimental::optional http://en.cppreference.com/w/cpp/experimental/optional -- To view, visit http://gerrit.cloudera.org:8080/7253 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2615: support [[nodiscard]] on Status
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-2615: support [[nodiscard]] on Status .. IMPALA-2615: support [[nodiscard]] on Status This is the set of changes required to get Impala to compile on GCC 7 using the [[nodiscard]] attribute, which generates a warning whenever a status is dropped. It is not enabled on the current default compiler GCC 4.9.2 or Clang 3.8 so I added WARN_UNUSED_RESULT in various classes so that we can catch the dropped statuses with our current toolchain. The changes are: * Use the new [[nodiscard]] attribute and fix all the dropped statuses. Many were innocuous or very improbably but some appear to be actual bugs. Adds a discard_result() function that explicitly ignores the result of a function. * Removes the bad JNI pattern of checking for exceptions after DeleteGlobalRef(), which doesn't throw. * Fix miscellaneous compile errors and warnings. * Remove use of ptr_vector, which pulls in headers with deprecated things. * Fix a memory lifetime bug with default_fs_ (it was masked by the old refcounted std::string implementation). Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197 --- M be/CMakeLists.txt M be/src/benchmarks/bit-packing-benchmark.cc M be/src/benchmarks/expr-benchmark.cc M be/src/benchmarks/hash-benchmark.cc M be/src/benchmarks/network-perf-benchmark.cc M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-util.cc M be/src/catalog/catalogd-main.cc M be/src/codegen/codegen-symbol-emitter.cc M be/src/codegen/llvm-codegen-test.cc M be/src/common/compiler-util.h M be/src/common/init.cc M be/src/common/status.h M be/src/exec/external-data-source-executor.cc M be/src/exec/hbase-scan-node.cc M be/src/exec/hbase-table-scanner.cc M be/src/exec/hbase-table-scanner.h M be/src/exec/hbase-table-writer.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exec/kudu-util.h M be/src/experiments/compression-test.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-test.cc M be/src/exprs/hive-udf-call.cc M be/src/rpc/auth-provider.h M be/src/rpc/thrift-client.cc M be/src/rpc/thrift-client.h M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/client-cache.cc M be/src/runtime/client-cache.h M be/src/runtime/collection-value-builder.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/hbase-table-factory.cc M be/src/runtime/hbase-table.cc M be/src/runtime/parallel-executor.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test.cc M be/src/service/client-request-state.cc M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impalad-main.cc M be/src/service/query-options-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/death-test-util.h M be/src/testutil/fault-injection-util.cc M be/src/testutil/impalad-query-executor.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/benchmark.cc M be/src/util/bit-util-test.cc M be/src/util/codec.h M be/src/util/filesystem-util.h M be/src/util/hdfs-util-test.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.h M be/src/util/metrics-test.cc M be/src/util/network-util.h M be/src/util/parquet-reader.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h 82 files changed, 397 insertions(+), 305 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7253/7 -- To view, visit http://gerrit.cloudera.org:8080/7253 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7526 to look at the new patch set (#11). Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types .. IMPALA-3931: arbitrary fixed-size uda intermediate types Make many builtin aggregate functions use fixed-length intermediate types: * avg() * ndv() * stddev(), variance(), etc * distinctpc(), distinctpcsa() sample(), appx_median(), histogram() and group_concat() actually allocate var-len data so aren't changed. This has some major benefits: * Spill-to-disk works properly with these aggregations. * Aggregations are more efficient because there is one less pointer indirection. * Aggregations use less memory, because we don't need an extra 12-byte StringValue for the indirection. Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type is represented in the same way as CHAR - a fixed-size array of bytes, stored inline in tuples. However, it is not user-visible and does not support CHAR semantics, i.e. users can't declare tables, functions, etc with the type. The pointer and length is passed into aggregate functions wrapped in a StringVal. Updates some internal codegen functions to work better with the new type. E.g. store values directly into the result tuple instead of via an intermediate stack allocation. Testing: This change only affects builtin aggregate functions, for which we have test coverage already. If we were to allow wider use of this type, it would need further testing. Added an analyzer test to ensure we can't use the type for UDAs. Added a regression test for spilling avg(). Added a regression test for UDA with CHAR intermediate hitting DCHECK. Perf: Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(), improved dramatically. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(60) | parquet / none / none | 18.44 | -17.54%| 11.92 | -5.34% | +--+---+-++++ +--+--+---++-++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +--+--+---++-++---++-+---+ | TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40 | 17.64 | +4.32% | 0.77% | 1.09%| 1 | 5 | | TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07 | 6.90| +2.36% | 0.28% | 0.30%| 1 | 5 | | TPCH(60) | TPCH-Q3 | parquet / none / none | 12.37 | 12.11 | +2.10% | 0.18% | 0.15%| 1 | 5 | | TPCH(60) | TPCH-Q7 | parquet / none / none | 42.48 | 42.09 | +0.93% | 2.45% | 0.80%| 1 | 5 | | TPCH(60) | TPCH-Q6 | parquet / none / none | 3.18 | 3.15| +0.89% | 0.67% | 0.76%| 1 | 5 | | TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24 | 7.20| +0.50% | 0.95% | 0.67%| 1 | 5 | | TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37 | 13.30 | +0.50% | 0.48% | 1.39%| 1 | 5 | | TPCH(60) | TPCH-Q5 | parquet / none / none | 7.47 | 7.44| +0.36% | 0.58% | 0.54%| 1 | 5 | | TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03 | 2.02| +0.06% | 0.26% | 1.95%| 1 | 5 | | TPCH(60) | TPCH-Q4 | parquet / none / none | 5.48 | 5.50| -0.27% | 0.62% | 1.12%| 1 | 5 | | TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11 | 22.18 | -0.31% | 0.18% | 0.55%| 1 | 5 | | TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45 | 8.48| -0.32% | 0.40% | 0.47%| 1 | 5 | | TPCH(60) | TPCH-Q9 | parquet / none / none | 33.39 | 33.66 | -0.81% | 0.75% | 0.59%| 1 | 5 | | TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34 | 72.07 | -1.01% | 1.84% | 1.79%| 1 | 5 | | TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93 | 6.00| -1.07% | 0.15% | 0.69%| 1 | 5 | | TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72 | 5.79| -1.09% | 0.59% | 0.51%| 1 | 5 | | TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42 |
[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types .. Patch Set 11: Code-Review+1 Rebase -- To view, visit http://gerrit.cloudera.org:8080/7526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 54: /// TYPE_DECIMAL/DecimalVal (isn't lowered): > Please update this comment to include TYPE_FIXED_UDA_INTERMEDIATE Done http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS8, Line 467: // Represent this as an array of bytes. : return ArrayType::get(GetType(TYPE_TINYINT), type.len); > It's a bit unfortunate that this is inconsistent with CodegenAnyVal::GetUnl I think this makes sense so long as we make it clear that the function returns the internal representation of the type. It's only called with an arbitrary ColumnType from GetLlvmStruct() and from CodegenAnyVal, where it's expecting the get the native type as it's represented in the tuple. I updated the comment. http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS8, Line 1541: // call void @"void impala::AggregateFunctions::HllUpdate"( : // %"class.impala_udf::FunctionContext"* %agg_fn_ctx, : // %"struct.impala_udf::TimestampVal"* %input_unlowered_ptr, : // %"struct.impala_udf::StringVal"* %dst_unlowered_ptr) : // %anyval_result = load { i64, i8* }, { i64, i8* }* %dst_lowered_ptr : // %7 = extractvalue { i64, i8* } %anyval_result, 0 : // %8 = ashr i64 %7, 32 : // %9 = trunc i64 %8 to i32 : // %10 = insertvalue %"struct.impala::StringValue" zeroinitializer, i32 %9, 1 : // %11 = extractvalue { i64, i8* } %anyval_result, 1 : // %12 = insertvalue %"struct.impala::StringValue" %10, i8* %11, 0 : // store %"struct.impala::StringValue" %12, %"struct.impala::StringValue"* %dst_slot_ptr : // br label %ret > Does this need some updating ? Done http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: PS8, Line 478: TYPE_FIXED_UDA_INTERMEDIATE: > This seems a bit tricky as the Serialize/Finalize functions need to be awar A DCHECK doesn't seem appropriate since a UDA could set the incorrect thing. Turns out that this code is reachable by declaring a UDA with char intermediate and a serialize function. I added a test with a CHAR intermediate that DCHECKs on master. The other option that makes sense to me is to simply ignore the return value and not warn, which is what the codegen'd path does anyway. -- To view, visit http://gerrit.cloudera.org:8080/7526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7526 to look at the new patch set (#9). Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types .. IMPALA-3931: arbitrary fixed-size uda intermediate types Make many builtin aggregate functions use fixed-length intermediate types: * avg() * ndv() * stddev(), variance(), etc * distinctpc(), distinctpcsa() sample(), appx_median(), histogram() and group_concat() actually allocate var-len data so aren't changed. This has some major benefits: * Spill-to-disk works properly with these aggregations. * Aggregations are more efficient because there is one less pointer indirection. * Aggregations use less memory, because we don't need an extra 12-byte StringValue for the indirection. Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type is represented in the same way as CHAR - a fixed-size array of bytes, stored inline in tuples. However, it is not user-visible and does not support CHAR semantics, i.e. users can't declare tables, functions, etc with the type. The pointer and length is passed into aggregate functions wrapped in a StringVal. Updates some internal codegen functions to work better with the new type. E.g. store values directly into the result tuple instead of via an intermediate stack allocation. Testing: This change only affects builtin aggregate functions, for which we have test coverage already. If we were to allow wider use of this type, it would need further testing. Added an analyzer test to ensure we can't use the type for UDAs. Added a regression test for spilling avg(). Added a regression test for UDA with CHAR intermediate hitting DCHECK. Perf: Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(), improved dramatically. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(60) | parquet / none / none | 18.44 | -17.54%| 11.92 | -5.34% | +--+---+-++++ +--+--+---++-++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +--+--+---++-++---++-+---+ | TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40 | 17.64 | +4.32% | 0.77% | 1.09%| 1 | 5 | | TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07 | 6.90| +2.36% | 0.28% | 0.30%| 1 | 5 | | TPCH(60) | TPCH-Q3 | parquet / none / none | 12.37 | 12.11 | +2.10% | 0.18% | 0.15%| 1 | 5 | | TPCH(60) | TPCH-Q7 | parquet / none / none | 42.48 | 42.09 | +0.93% | 2.45% | 0.80%| 1 | 5 | | TPCH(60) | TPCH-Q6 | parquet / none / none | 3.18 | 3.15| +0.89% | 0.67% | 0.76%| 1 | 5 | | TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24 | 7.20| +0.50% | 0.95% | 0.67%| 1 | 5 | | TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37 | 13.30 | +0.50% | 0.48% | 1.39%| 1 | 5 | | TPCH(60) | TPCH-Q5 | parquet / none / none | 7.47 | 7.44| +0.36% | 0.58% | 0.54%| 1 | 5 | | TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03 | 2.02| +0.06% | 0.26% | 1.95%| 1 | 5 | | TPCH(60) | TPCH-Q4 | parquet / none / none | 5.48 | 5.50| -0.27% | 0.62% | 1.12%| 1 | 5 | | TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11 | 22.18 | -0.31% | 0.18% | 0.55%| 1 | 5 | | TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45 | 8.48| -0.32% | 0.40% | 0.47%| 1 | 5 | | TPCH(60) | TPCH-Q9 | parquet / none / none | 33.39 | 33.66 | -0.81% | 0.75% | 0.59%| 1 | 5 | | TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34 | 72.07 | -1.01% | 1.84% | 1.79%| 1 | 5 | | TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93 | 6.00| -1.07% | 0.15% | 0.69%| 1 | 5 | | TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72 | 5.79| -1.09% | 0.59% | 0.51%| 1 | 5 | | TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42 | 45.93
[Impala-ASF-CR] IMPALA-5743: Support TLS version configuration for Thrift servers
Henry Robinson has posted comments on this change. Change subject: IMPALA-5743: Support TLS version configuration for Thrift servers .. Patch Set 3: This patch passes BE tests on a Centos 6 machine with OpenSSL v1.0.0. -- To view, visit http://gerrit.cloudera.org:8080/7606 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5743: Support TLS version configuration for Thrift servers
Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-5743: Support TLS version configuration for Thrift servers .. IMPALA-5743: Support TLS version configuration for Thrift servers * Add --ssl_minimum_version which controls the minimum SSL/TLS version that clients and servers will use when negotiating a secure connection. * Two kinds of version specification are allowed: 'TLSv1.1' enables TLSv1.1 and all subsequent verisons. 'TLSv1.1_only' enables only TLSv1.1. The latter is not exposed in user-facing text as it is typically only used for testing. * Handle case where platform may not support TLSv1.1 or v1.2 by checking OpenSSL version number. * Bump Thrift toolchain version to -p10. Testing: * New tests in thrift-server-test.cc. In particular, test all 36 configurations of client and server protocol versions, and ensure that the expected successes or failures are seen. Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5 --- M be/src/rpc/thrift-client.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/service/impala-server.cc M bin/impala-config.sh 7 files changed, 185 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7606/3 -- To view, visit http://gerrit.cloudera.org:8080/7606 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool
Henry Robinson has posted comments on this change. Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool .. Patch Set 3: BE and EE tests pass with ASAN enabled with this patch. -- To view, visit http://gerrit.cloudera.org:8080/7591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4669: [SECURITY] Add security library to build .. Patch Set 18: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1045/ -- To view, visit http://gerrit.cloudera.org:8080/5717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4833: Compute precise per-host reservation size .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4833: Compute precise per-host reservation size .. IMPALA-4833: Compute precise per-host reservation size Before this change, the per-host reservation size was computed by the Planner. However, scheduling happens after planning, so the Planner must assume that all fragments run on all hosts, and the reservation size is likely much larger than it needs to be. This moves the computation of the per-host reservation size to the BE where it can be computed more precisely. This also includes a number of plan/profile changes. Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac Reviewed-on: http://gerrit.cloudera.org:8080/7630 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/query-state.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M tests/custom_cluster/test_coordinators.py A tests/custom_cluster/test_mem_reservations.py 27 files changed, 390 insertions(+), 241 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7630 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong