[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1047/

-- 
To view, visit http://gerrit.cloudera.org:8080/7591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1046/

-- 
To view, visit http://gerrit.cloudera.org:8080/7591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1046/

-- 
To view, visit http://gerrit.cloudera.org:8080/7591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3208: max row size option

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-3208: max_row_size option
..

IMPALA-3208: max_row_size option

This is a preview because it is missing tests. I have manually tested
it and it is behaving it as expected so far.

Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.

The default value is 512KB. I picked that number because it doesn't
increase minimum reservations *too* much but should be large enough
for almost all reasonable workloads.

This is implemented in the aggs and joins using the variable page size
support added to BufferedTupleStream in an earlier commit. The synopsis
is that each stream requires reservation for one default-sized page
per read and write iterator, and temporarily requires reservation
for a max-sized page when reading or writing larger pages. The
max-sized write reservation is released immediately after the row
is appended and the max-size read reservation is released after
advancing to the next row.

The sorter and analytic simply use max-sized buffers for all pages
in the stream.

Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.

Added "set" test to check validation of query option.

Added end-to-end tests exercising spilling operators with large rows
with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY.

TODO: the end-to-end tests are very memory hungry and may need some
stabilisation to execute in parallel.

Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/sorter.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
A fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
A testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M tests/query_test/test_spilling.py
34 files changed, 1,001 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7629/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7629
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7:

Running tests against the Impala-lzo branch with matching change: 
https://jenkins.impala.io/job/parallel-all-tests/1281/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5681: release reservation from blocking operators
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7619/4/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS4, Line 1203:  
> put a comma here
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5681: release reservation from blocking operators

2017-08-12 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7619

to look at the new patch set (#5).

Change subject: IMPALA-5681: release reservation from blocking operators
..

IMPALA-5681: release reservation from blocking operators

When an in-memory blocking aggregation or join is in the GetNext()
phase where it is outputting accumulated rows then we expect
memory consumption to monotonically decrease because no more
rows will be accumulated in memory.

This change adds support to release unused reservation and makes
use of it for in-memory aggregations and sorts.

We don't release memory for operators with spilled data, since they
may need the reservation to bring it back into memory. We also
don't release memory in subplans, since it will probably be used
in a later iteration of the subplan.

Testing:
Updated spilling test that now requires less memory.

Ran stress test binary search on tpch_parquet. No changes, except
Q18 now requires 325MB instead of 450MB to execute without spilling.

Ran query with two sorts in the same pipeline and watched /memz to
confirm that the first node in the pipeline was incrementally releasing
memory. Added a regression test based on this experiment.

Added a backend test to directly test reservation decreasing.

Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f
---
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
A testdata/workloads/tpch/queries/sort-reservation-usage.test
M tests/query_test/test_sort.py
14 files changed, 163 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/7619/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2615: support [[nodiscard]] on Status

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2615: support [[nodiscard]] on Status
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7253/5/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 73: if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0)
> I find it is a confusing pattern.
I don't actually see a reasonable alternative that doesn't involve extra 
redundant code or a lot of refactoring. The current approach is to build up the 
argument strings then switch on ${CMAKE_BUILD_TYPE} below to determine whether 
it's a GCC or Clang build.

I added a comment to explain why it works. We can remove the branch and clean 
it up when we upgrade GCC.


Line 74:   #  -Wno-placement-new: avoid spurious warnings from boost::function.
> Are those warnings really spurious? I can't tell immediately from the link.
It seems ok - the memory region written is large enough. The function_buffer 
union is large enough at 8 bytes, it's just the data member that is too small 
at 1 byte.


Line 75:   #   TODO: remove when we upgrade Boost: 
https://github.com/boostorg/function/pull/9
> Thanks for looking into that.
-faligned-new I think is fine, since it's enabling improved behaviour that is 
part of the C++17 standard instead of masking an issue.

I don't think we should let upgrading Boost block upgrading GCC - I have a 
feeling that the Boost upgrade could be a lot of work.


http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

Line 443:   /// the
> fix wrap
Done


http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

Line 71:   Status status = impala->SetCatalogInitialized();
> const
Done


http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/parquet-reader.cc
File be/src/util/parquet-reader.cc:

Line 138: // Returns the number of rows specified by the header.
> pre-existing issue, but: can you add a note that this function can exit(1) 
Done


http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 748:   MakeScopeExitTrigger([]() { compressor->Close(); });
> Should Close() be added to the destructor (after making it unipotent)?
We've been trying to standardise elsewhere on explicit resource release with 
trivial destructors. The argument is that implicit sequencing of non-trivial 
teardown steps is very hard to understand.

I put up a page in the wiki that documents some of the patterns in the code: 
https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala


http://gerrit.cloudera.org:8080/#/c/7253/6/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

Line 268:   Status SerializeToArchiveString(std::string* out) const 
WARN_UNUSED_RESULT;
> Makes the case for a Maybe template class that can hold a T or a Status.
I don't feel strongly. I like those types in other languages but they don't 
seem to work as well in C++ where there isn't a convenient language construct 
to branch on the result and unpack the value.

We have to use LLVM's ErrorOr in a couple of places and IMO the code wasn't 
simpler than using an output argument.

It looks like C++17 has std::optional, which is available earlier as 
std::experimental::optional 
http://en.cppreference.com/w/cpp/experimental/optional


-- 
To view, visit http://gerrit.cloudera.org:8080/7253
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2615: support [[nodiscard]] on Status

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-2615: support [[nodiscard]] on Status
..

IMPALA-2615: support [[nodiscard]] on Status

This is the set of changes required to get Impala to compile on
GCC 7 using the [[nodiscard]] attribute, which generates a warning
whenever a status is dropped. It is not enabled on the current
default compiler GCC 4.9.2 or Clang 3.8 so I added WARN_UNUSED_RESULT
in various classes so that we can catch the dropped statuses with
our current toolchain.

The changes are:
* Use the new [[nodiscard]] attribute and fix all the dropped
  statuses. Many were innocuous or very improbably but some appear to
  be actual bugs. Adds a discard_result() function that explicitly
  ignores the result of a function.
* Removes the bad JNI pattern of checking for exceptions after
  DeleteGlobalRef(), which doesn't throw.
* Fix miscellaneous compile errors and warnings.
* Remove use of ptr_vector, which pulls in headers with deprecated
  things.
* Fix a memory lifetime bug with default_fs_ (it was masked by
  the old refcounted std::string implementation).

Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197
---
M be/CMakeLists.txt
M be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/expr-benchmark.cc
M be/src/benchmarks/hash-benchmark.cc
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalogd-main.cc
M be/src/codegen/codegen-symbol-emitter.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/common/compiler-util.h
M be/src/common/init.cc
M be/src/common/status.h
M be/src/exec/external-data-source-executor.cc
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exec/hbase-table-scanner.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/kudu-util.h
M be/src/experiments/compression-test.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/rpc/auth-provider.h
M be/src/rpc/thrift-client.cc
M be/src/rpc/thrift-client.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/client-cache.cc
M be/src/runtime/client-cache.h
M be/src/runtime/collection-value-builder.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/runtime/hbase-table.cc
M be/src/runtime/parallel-executor.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test.cc
M be/src/service/client-request-state.cc
M be/src/service/fe-support.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M be/src/service/query-options-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/death-test-util.h
M be/src/testutil/fault-injection-util.cc
M be/src/testutil/impalad-query-executor.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
M be/src/util/benchmark.cc
M be/src/util/bit-util-test.cc
M be/src/util/codec.h
M be/src/util/filesystem-util.h
M be/src/util/hdfs-util-test.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.h
M be/src/util/metrics-test.cc
M be/src/util/network-util.h
M be/src/util/parquet-reader.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/thread-pool.h
M be/src/util/thread.cc
M be/src/util/thread.h
82 files changed, 397 insertions(+), 305 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7253/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7253
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I972543af2e9f98b12dcbb5479b4c1a7d53952197
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types

2017-08-12 Thread Tim Armstrong (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7526

to look at the new patch set (#11).

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
..

IMPALA-3931: arbitrary fixed-size uda intermediate types

Make many builtin aggregate functions use fixed-length intermediate
types:
* avg()
* ndv()
* stddev(), variance(), etc
* distinctpc(), distinctpcsa()

sample(), appx_median(), histogram() and group_concat() actually
allocate var-len data so aren't changed.

This has some major benefits:
* Spill-to-disk works properly with these aggregations.
* Aggregations are more efficient because there is one less pointer
  indirection.
* Aggregations use less memory, because we don't need an extra 12-byte
  StringValue for the indirection.

Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type
is represented in the same way as CHAR - a fixed-size array of bytes,
stored inline in tuples. However, it is not user-visible and does
not support CHAR semantics, i.e. users can't declare tables, functions,
etc with the type. The pointer and length is passed into aggregate functions
wrapped in a StringVal.

Updates some internal codegen functions to work better with the new
type. E.g. store values directly into the result tuple instead of
via an intermediate stack allocation.

Testing:
This change only affects builtin aggregate functions, for which we
have test coverage already. If we were to allow wider use of this type,
it would need further testing.

Added an analyzer test to ensure we can't use the type for UDAs.

Added a regression test for spilling avg().

Added a regression test for UDA with CHAR intermediate hitting DCHECK.

Perf:
Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(),
improved dramatically.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(60) | parquet / none / none | 18.44   | -17.54%| 11.92  | -5.34% 
|
+--+---+-++++

+--+--+---++-++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+--+--+---++-++---++-+---+
| TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40  | 17.64   | +4.32%   
  |   0.77%   |   1.09%| 1   | 5 |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07   | 6.90| +2.36%   
  |   0.28%   |   0.30%| 1   | 5 |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 12.37  | 12.11   | +2.10%   
  |   0.18%   |   0.15%| 1   | 5 |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.48  | 42.09   | +0.93%   
  |   2.45%   |   0.80%| 1   | 5 |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.18   | 3.15| +0.89%   
  |   0.67%   |   0.76%| 1   | 5 |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24   | 7.20| +0.50%   
  |   0.95%   |   0.67%| 1   | 5 |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37  | 13.30   | +0.50%   
  |   0.48%   |   1.39%| 1   | 5 |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.47   | 7.44| +0.36%   
  |   0.58%   |   0.54%| 1   | 5 |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03   | 2.02| +0.06%   
  |   0.26%   |   1.95%| 1   | 5 |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.48   | 5.50| -0.27%   
  |   0.62%   |   1.12%| 1   | 5 |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11  | 22.18   | -0.31%   
  |   0.18%   |   0.55%| 1   | 5 |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.48| -0.32%   
  |   0.40%   |   0.47%| 1   | 5 |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.39  | 33.66   | -0.81%   
  |   0.75%   |   0.59%| 1   | 5 |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34  | 72.07   | -1.01%   
  |   1.84%   |   1.79%| 1   | 5 |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93   | 6.00| -1.07%   
  |   0.15%   |   0.69%| 1   | 5 |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72   | 5.79| -1.09%   
  |   0.59%   |   0.51%| 1   | 5 |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42  | 

[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
..


Patch Set 11: Code-Review+1

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/7526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/codegen/codegen-anyval.h
File be/src/codegen/codegen-anyval.h:

Line 54: /// TYPE_DECIMAL/DecimalVal (isn't lowered):
> Please update this comment to include TYPE_FIXED_UDA_INTERMEDIATE
Done


http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS8, Line 467: // Represent this as an array of bytes.
 :   return ArrayType::get(GetType(TYPE_TINYINT), type.len);
> It's a bit unfortunate that this is inconsistent with CodegenAnyVal::GetUnl
I think this makes sense so long as we make it clear that the function returns 
the internal representation of the type. It's only called with an arbitrary 
ColumnType from GetLlvmStruct() and from CodegenAnyVal, where it's expecting 
the get the native type as it's represented in the tuple.

I updated the comment.


http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS8, Line 1541: //   call void @"void 
impala::AggregateFunctions::HllUpdate"(
  : //   %"class.impala_udf::FunctionContext"* %agg_fn_ctx,
  : //   %"struct.impala_udf::TimestampVal"* 
%input_unlowered_ptr,
  : //   %"struct.impala_udf::StringVal"* %dst_unlowered_ptr)
  : //   %anyval_result = load { i64, i8* }, { i64, i8* }* 
%dst_lowered_ptr
  : //   %7 = extractvalue { i64, i8* } %anyval_result, 0
  : //   %8 = ashr i64 %7, 32
  : //   %9 = trunc i64 %8 to i32
  : //   %10 = insertvalue %"struct.impala::StringValue" 
zeroinitializer, i32 %9, 1
  : //   %11 = extractvalue { i64, i8* } %anyval_result, 1
  : //   %12 = insertvalue %"struct.impala::StringValue" %10, i8* 
%11, 0
  : //   store %"struct.impala::StringValue" %12, 
%"struct.impala::StringValue"* %dst_slot_ptr
  : //   br label %ret
> Does this need some updating ?
Done


http://gerrit.cloudera.org:8080/#/c/7526/8/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

PS8, Line 478: TYPE_FIXED_UDA_INTERMEDIATE:
> This seems a bit tricky as the Serialize/Finalize functions need to be awar
A DCHECK doesn't seem appropriate since a UDA could set the incorrect thing. 
Turns out that this code is reachable by declaring a UDA with char intermediate 
and a serialize function. I added a test with a CHAR intermediate that DCHECKs 
on master.

The other option that makes sense to me is to simply ignore the return value 
and not warn, which is what the codegen'd path does anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/7526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types

2017-08-12 Thread Tim Armstrong (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7526

to look at the new patch set (#9).

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
..

IMPALA-3931: arbitrary fixed-size uda intermediate types

Make many builtin aggregate functions use fixed-length intermediate
types:
* avg()
* ndv()
* stddev(), variance(), etc
* distinctpc(), distinctpcsa()

sample(), appx_median(), histogram() and group_concat() actually
allocate var-len data so aren't changed.

This has some major benefits:
* Spill-to-disk works properly with these aggregations.
* Aggregations are more efficient because there is one less pointer
  indirection.
* Aggregations use less memory, because we don't need an extra 12-byte
  StringValue for the indirection.

Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type
is represented in the same way as CHAR - a fixed-size array of bytes,
stored inline in tuples. However, it is not user-visible and does
not support CHAR semantics, i.e. users can't declare tables, functions,
etc with the type. The pointer and length is passed into aggregate functions
wrapped in a StringVal.

Updates some internal codegen functions to work better with the new
type. E.g. store values directly into the result tuple instead of
via an intermediate stack allocation.

Testing:
This change only affects builtin aggregate functions, for which we
have test coverage already. If we were to allow wider use of this type,
it would need further testing.

Added an analyzer test to ensure we can't use the type for UDAs.

Added a regression test for spilling avg().

Added a regression test for UDA with CHAR intermediate hitting DCHECK.

Perf:
Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(),
improved dramatically.

+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(60) | parquet / none / none | 18.44   | -17.54%| 11.92  | -5.34% 
|
+--+---+-++++

+--+--+---++-++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+--+--+---++-++---++-+---+
| TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40  | 17.64   | +4.32%   
  |   0.77%   |   1.09%| 1   | 5 |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07   | 6.90| +2.36%   
  |   0.28%   |   0.30%| 1   | 5 |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 12.37  | 12.11   | +2.10%   
  |   0.18%   |   0.15%| 1   | 5 |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.48  | 42.09   | +0.93%   
  |   2.45%   |   0.80%| 1   | 5 |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.18   | 3.15| +0.89%   
  |   0.67%   |   0.76%| 1   | 5 |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24   | 7.20| +0.50%   
  |   0.95%   |   0.67%| 1   | 5 |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37  | 13.30   | +0.50%   
  |   0.48%   |   1.39%| 1   | 5 |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.47   | 7.44| +0.36%   
  |   0.58%   |   0.54%| 1   | 5 |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03   | 2.02| +0.06%   
  |   0.26%   |   1.95%| 1   | 5 |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.48   | 5.50| -0.27%   
  |   0.62%   |   1.12%| 1   | 5 |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11  | 22.18   | -0.31%   
  |   0.18%   |   0.55%| 1   | 5 |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.48| -0.32%   
  |   0.40%   |   0.47%| 1   | 5 |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.39  | 33.66   | -0.81%   
  |   0.75%   |   0.59%| 1   | 5 |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34  | 72.07   | -1.01%   
  |   1.84%   |   1.79%| 1   | 5 |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93   | 6.00| -1.07%   
  |   0.15%   |   0.69%| 1   | 5 |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72   | 5.79| -1.09%   
  |   0.59%   |   0.51%| 1   | 5 |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42  | 45.93 

[Impala-ASF-CR] IMPALA-5743: Support TLS version configuration for Thrift servers

2017-08-12 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5743: Support TLS version configuration for Thrift 
servers
..


Patch Set 3:

This patch passes BE tests on a Centos 6 machine with OpenSSL v1.0.0.

-- 
To view, visit http://gerrit.cloudera.org:8080/7606
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5743: Support TLS version configuration for Thrift servers

2017-08-12 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#3).

Change subject: IMPALA-5743: Support TLS version configuration for Thrift 
servers
..

IMPALA-5743: Support TLS version configuration for Thrift servers

* Add --ssl_minimum_version which controls the minimum SSL/TLS version
  that clients and servers will use when negotiating a secure
  connection.
* Two kinds of version specification are allowed: 'TLSv1.1' enables
  TLSv1.1 and all subsequent verisons. 'TLSv1.1_only' enables only
  TLSv1.1. The latter is not exposed in user-facing text as it is
  typically only used for testing.
* Handle case where platform may not support TLSv1.1 or v1.2 by checking
  OpenSSL version number.
* Bump Thrift toolchain version to -p10.

Testing:
* New tests in thrift-server-test.cc. In particular, test all 36
  configurations of client and server protocol versions, and ensure that
  the expected successes or failures are seen.

Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5
---
M be/src/rpc/thrift-client.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M bin/impala-config.sh
7 files changed, 185 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7606/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7606
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c68a6c9658ddbfbe8025f2021fd5ed7a9dec5a5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5666: ASAN poisoning for MemPool and BufferPool

2017-08-12 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5666: ASAN poisoning for MemPool and BufferPool
..


Patch Set 3:

BE and EE tests pass with ASAN enabled with this patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/7591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8d5a28dfee2b7c631981aac75524bde9acc0b36a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4669: [SECURITY] Add security library to build
..


Patch Set 18:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1045/

-- 
To view, visit http://gerrit.cloudera.org:8080/5717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4833: Compute precise per-host reservation size
..


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4833: Compute precise per-host reservation size

2017-08-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4833: Compute precise per-host reservation size
..


IMPALA-4833: Compute precise per-host reservation size

Before this change, the per-host reservation size was computed
by the Planner. However, scheduling happens after planning,
so the Planner must assume that all fragments run on all
hosts, and the reservation size is likely much larger than
it needs to be.

This moves the computation of the per-host reservation size
to the BE where it can be computed more precisely. This also
includes a number of plan/profile changes.

Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Reviewed-on: http://gerrit.cloudera.org:8080/7630
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M tests/custom_cluster/test_coordinators.py
A tests/custom_cluster/test_mem_reservations.py
27 files changed, 390 insertions(+), 241 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong