[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


IMPALA-5553: Fix expr-test in release builds

expr-test fails in release build as it uses
Literal::CreateLiteral() to create literal
expressions. Literal::CreateLiteral() wraps
ParseString() with DCHECK() so it becomes
a no-op in release builds.

This change fixes the issue by moving
Literal::CreateLiteral() from literal.cc to
expr-test.cc as it's only used by the test
code. Also replaces DCHECK() wrapped around
ParseString() with EXPECT_TRUE().

Verified the fix by building and running
a release build of expr-test.

Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Reviewed-on: http://gerrit.cloudera.org:8080/7255
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
3 files changed, 114 insertions(+), 102 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/778/

-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..


IMPALA-5548 Fix some minor nits with HDFS parquet column readers

Replace some std::map instances with unordered maps.  std::map is
unnecessary in many cases where an unordered map should suffice, and
in almost all circumstances, perform better.  Also discovered was a
slightly dangerous initialization of an empty NullOffsetIndicator,
which could conceivably result in undefined behavior by writing
arr[ofs] |= b, where ofs was mistakenly initialized to -1 (and b to 0,
so such behavior may not be detected, but could cause a crash by
underrunning a buffer).  Also add warn unused to status results while
we are changing this.

Testing: Running exhaustive tests against a Jenkins build.

Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Reviewed-on: http://gerrit.cloudera.org:8080/7240
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.h
M be/src/runtime/descriptors.h
5 files changed, 22 insertions(+), 19 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4703: reservation denial debug action

2017-06-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4703: reservation denial debug action
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-06-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS3, Line 252: batch->compressed_tuple_data
> The ownership is shared with the batch object. AddSidecar() internally move
I see. I am still getting used to this subtly of passing shared_ptr as argument.


-- 
To view, visit http://gerrit.cloudera.org:8080/7103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 91:   protected List> fkPkEqJoinConjuncts_;
Class member is only used in one function in the base class.  Maybe document 
that this is preserved just for printing values and wrap it with a getter so it 
can't be modified?


Line 253:   return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, 
rhsCard);
This computes estimates based on both PK conjuncts and non-PK conjuncts using 
the same formula, instead of filtering out non-PK joins.  If we have a RHS 
which is an extremely selective non-PK, this could bias the cardinality 
estimate to a very low value, since we take the minimum computed cardinality 
from these assumed PK values.

Is there any value in preserving the >> construction, or should 
getFkPkEqJoinConjuncts consider the conjuncts by tid order and just return a 
flattened list (which can be passed to getFkPkJoinCardinality)


Line 275: for (List fkPkCandi: 
scanSlotsByTids.values()) {
fkPkCandi - this could use a better name, it isn't clear to me what this is 
trying to describe.


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden
..


IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden

Tested by exporting a different IMPALA_LOGS_DIR path, then confirming
that it was not changed by sourcing impala-config.sh.

Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4
Reviewed-on: http://gerrit.cloudera.org:8080/7197
Reviewed-by: David Knupp 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  David Knupp: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has abandoned this change.

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..


Abandoned

Bad merge - will restore when I fix it.

-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Henry Robinson (Code Review)
Hello Impala Public Jenkins, Michael Ho, Sailesh Mukil, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7226

to look at the new patch set (#5).

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..

IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

Change allocation pattern for Codec objects in RowBatch to be
stack-allocated. Make c'tors and Init() methods of codec implementations
publicly visible in order to do so.

Refactor HdfsParquetWriter etc. to deal properly with Status returned in
all cases.

Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
---
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/experiments/compression-test.cc
M be/src/runtime/row-batch.cc
M be/src/util/codec.h
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.h
12 files changed, 160 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7226/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


IMPALA-5549: Remove deprecated fields from CatalogService API

Remove from TCatalogUpdateResult the deprecated fields that are no longer
used and modify the catalog to always populate the
'updated_catalog_objects' and 'removed_catalog_object' fields with the
updated/removed catalog objects.

Testing: Run core tests.

Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Reviewed-on: http://gerrit.cloudera.org:8080/7248
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 22 insertions(+), 103 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..

IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired in Open() and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open()
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
A fe/src/main/java/org/apache/impala/planner/PlanVisitor.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
38 files changed, 2,767 insertions(+), 384 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG
Commit Message:

PS3, Line 15: a plan node
> I hear your concerns and I do think we should think it through carefully. I
Unless we are sure this is a common user scenario, I lean towards a default 
value of 0. I don't think the risk of regressions is worth the unknown benefit.

I agree with Tim that the behavior is easy to explain, but it still means users 
might have to deal with tuning this value after upgrading.


http://gerrit.cloudera.org:8080/#/c/7153/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

Line 497: 
query_options->__set_disable_codegen_rows_threshold(atoi(value.c_str()));
Consider using StringParser::StringToInt() like in other places here, so we can 
properly validate the input.


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 255:   // If the number of rows per node is below the threshold codegen 
will be automatically
If the number of rows processed per node ...

(to make it clear that these are the input rows and not the output rows)


http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 279:   // If the number of rows per node is below the threshold and 
disable_codegen is unset,
If the number of rows processed per node


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 151: 
Needs PlannerTests. You can do something like 
PlannerTest.testComputeStatsMtDop().


Line 152: checkForDisableCodegen(rootFragment.getPlanRoot());
Add comment why we have to do this on the fragmented plan


Line 477: boolean isSmallQuery = visitor.valid() && 
visitor.getMaxRowsProcessed() < threshold;
remove visitor.valid() and inline result into the if


Line 492:   private void checkForDisableCodegen(PlanNode planRoot) {
planRoot -> distributedPlan


Line 493: // Determine the maximum number of rows processed by a instance 
of a plan node.
This comments reads like the comment on 
MaxRowsProcessedVisitor.maxRowsProcessed_. I think we need crisper explanations 
for the two maximum numbers.


http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 32:   // True if we should abort because if we don't have valid estimates
remove second "if"


Line 47: if (caller instanceof HashJoinNode || caller instanceof 
NestedLoopJoinNode) {
if (caller instanceof JoinNode) foundJoinNode_ = true;


Line 60:   || (missingStats && !scan.hasLimit() && 
scan.getConjuncts().isEmpty())) {
Why would the existence of scan conjuncts make our estimate valid despite 
missing stats?


Line 66:   Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
Nit: Do the division as double and take the ceil, otherwise it seems like we're 
losing rows.


Line 77:   Math.max(maxRowsProcessedPerNode_, numRows / numNodes);
Nit: Do the division as double and take the ceil, otherwise it seems like we're 
losing rows.


Line 81:   public boolean valid() {
single line


Line 96: return foundJoinNode_;
Preconditions.checkState(valid_); seems appropriate here as well


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
Any reason not to make these PlannerTests?


http://gerrit.cloudera.org:8080/#/c/7153/3/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

Line 139:   
disable_codegen_rows_threshold_options=[5000],
Why not 0? For the same reason you use 0 below.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

https://gerrit.cloudera.org/#/c/7257/ will also help make the primary key 
detection more reliable!

-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [SECURITY] Add security library to build
..


Patch Set 10:

Michael - this includes the KRB5 and OpenSSL linking fixes.

-- 
To view, visit http://gerrit.cloudera.org:8080/5717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG
Commit Message:

PS3, Line 15: a plan node
> Multi-column PK is not uncommon. Even with accurate stats, Planner could ge
I hear your concerns and I do think we should think it through carefully. I 
also think we shouldn't be too scared to change defaults when the current 
defaults don't make really sense. I'd struggle to explain to someone why Impala 
spends multiple seconds doing codegen when the planner estimates show that it's 
a small query that will take 10s or 100s of milliseconds to execute. It's easy 
to explain that Impala decided to disable codegen because the estimates showed 
it to be the right decision (but the estimates were wrong).

If it's a bad fit for a particular workload it's possible to disable this or 
change the threshold. The current threshold is pretty safe according to my 
experiments.

The motivation is that now if you have a workload with a mix of big and small 
queries, you get stuck in a situation where the small queries are burning lots 
of CPU doing unnecessary codegen. The only workaround is to manually disable 
codegen for the small queries, but that's not really practical to do in most 
cases.

RE: disabling it for joins. Part of the motivation was that the small query 
optimisation doesn't work for joins. It was disabled because of a bug with 
num_nodes=1.

RE: explain plans. It does show up in profiles with a string like "codegen 
disabled due to optimization hints" for the plan nodes. I could add it to 
explain plans too but it wasn't clear if it was worth adding even more detail 
to them.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#15).

Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore 
services to KRPC
..

IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

This patch adds the core abstractions necessary for using Kudu RPC for
Impala's backend services. The StatestoreService and
StatestoreSubscriberService are both ported to KRPC to demonstrate how
the new RPC framework operates.

The main new class is RpcMgr, which is responsible for both services run
by a daemon and for obtaining clients to remote services. The new Rpc
class helps clients build rpc invocation objects, and use them to call
remote methods.

Also done:
 * Utility code to convert from a Thrift structure to a Protobuf
   wrapper. Thrift RPCs do not need to be ported to Protobuf to work.
 * Added more build support for Protobuf generation. All library targets
   now depend on the Protobuf generation step, just as they do Thrift.
 * thrift-server-test is rewritten to use Beeswax as a test service,
   since the old StatestoreService has been removed.
 * Remove InProcessStatestore that was only used for statestore-test and
   mini-impala-cluster. As far as we know, mini-impala-cluster is unused
   by anyone. (IMPALA-4784)

TESTING:
 * Impala's test suite passes.
 * statestore-test has been rewritten to use new statestore code. It
   includes all test cases from the now-removed Python test_statestore.py
 * rpc-mgr-test unit-tests the RpcMgr and Rpc classes.

TODO:
 * Enable SSL and Kerberos support (already in Kudu's imported
   libraries)

Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/catalog-op-executor.cc
M be/src/exec/kudu-util.h
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
A be/src/rpc/common.proto
A be/src/rpc/rpc-mgr-test.cc
A be/src/rpc/rpc-mgr.cc
A be/src/rpc/rpc-mgr.h
A be/src/rpc/rpc-mgr.inline.h
A be/src/rpc/rpc.h
A be/src/rpc/rpc_test.proto
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/statestore/CMakeLists.txt
D be/src/statestore/statestore-service-client-wrapper.h
D be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
A be/src/statestore/statestore.proto
M be/src/statestore/statestored-main.cc
M be/src/testutil/CMakeLists.txt
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
D be/src/testutil/mini-impala-cluster.cc
M be/src/transport/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/collection-metrics.h
M be/src/util/counting-barrier.h
M bin/start-impala-cluster.py
M bin/start-impalad.sh
M common/thrift/CMakeLists.txt
M common/thrift/StatestoreService.thrift
D tests/statestore/test_statestore.py
53 files changed, 1,733 insertions(+), 1,219 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/15
-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4669: [KRPC] Add kudu rpc library to build

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#10).

Change subject: IMPALA-4669: [KRPC] Add kudu_rpc library to build
..

IMPALA-4669: [KRPC] Add kudu_rpc library to build

Import FindKRPC.cmake from Apache Kudu.

One minor linking change to krpc/CMakeLists.txt.

Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc
---
M CMakeLists.txt
M be/CMakeLists.txt
A cmake_modules/FindKRPC.cmake
3 files changed, 129 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/5719/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs

2017-06-21 Thread Henry Robinson (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6473

to look at the new patch set (#8).

Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs
..

IMPALA-4889: Use client sidecars for Thrift RPCs

This patch changes the way Thrift structures are serialized with
KRPC. Instead of serializing them to a byte stream, and then writing
that stream to a Protobuf object which is serialized again en route to
the wire, the Thrift objects are serialized to byte streams which are
then directly written to the wire as a 'sidecar'. This saves a copy and
serialization step both on the sender and receiver sides of the RPC.

Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
---
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc.h
M be/src/rpc/thrift-util.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
7 files changed, 135 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/6473/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 5:

(22 comments)

PS4 is a rebase. PS5 includes the review responses (so diff 4->5 if you want to 
see what changed).

http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/rpc/rpc.h
File be/src/rpc/rpc.h:

PS3, Line 106: Ownership is
 :   // shared by the caller, and the RPC subsystem
> Doesn't std::move transfer the ownership so the caller no longer shares the
The shared_ptr is copied in. It's the copy that is then moved into the sidecar 
list.


PS3, Line 143: are owned by the caller
> the ownership is temporarily transferred to the RPC call when this function
I don't think so - the RPC call has pointers, but doesn't have ownership in the 
sense that it has no responsibility for managing a reference count or freeing 
the memory.


PS3, Line 147: 
> Having the names 'func', 'cb' and 'cb_wrapper' all close by each other make
Done


PS3, Line 153: 
> Does this move mean that the params_ member is invalid after this call? If 
Done


PS3, Line 327: 
> Maybe name this 'completion_cb'.
Done


http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

PS3, Line 389: equest->mutable_bloom_filter()->set_log_heap_space(0);
 : 
request->mutable_bloom_filter()->set_directory_sidecar_idx(-1);
 :   }
> Why wouldn't a move capture ensure the same thing?
proto_filter is a const shared_ptr&. You can't move from it. Instead, we could 
have the argument be shared_ptr, and move from it here; 
they're basically equivalent, it's just a question of where you make the copy.


http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS3, Line 1207:   VLOG_QUERY << "Not enough memory to allocate filter: "
  :  << PrettyPrinter::Print(heap_space, 
TUnit::BYTES)
  :  << " (query: " << coord->query_id() << ")";
  :   // Disable, as one missing update means a correct filter 
cannot be 
> I would add this to the commit message. This means we would take double the
I don't think so - because params.directory is a sidecar I don't think it's 
been copied since it arrived on the socket. In the Thrift case, the bytestream 
had to be deserialized into a TBloomFilter. That's what's happening here - the 
equivalent 'deserialization' step.

This path should only get taken the first time a filter arrives, and it does 
briefly keep two filters around (the sidecar should get destroyed as soon as 
the RPC is responded to).


http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-recvr.cc
File be/src/runtime/data-stream-recvr.cc:

PS3, Line 280: blocked_senders_.front()
> Is this a right way to dispose a unique_ptr?
Good point - release() is clearer, and get() may have been a benign bug.


http://gerrit.cloudera.org:8080/#/c/7103/3/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS3, Line 58: 
> Not used.
Done


PS3, Line 60: 
> Not used.
Done


PS3, Line 133: scoped_ptr batch_;
> No one really calls GetNumDataBytesSent() (except from our BE test). So, I'
We're gaining correctness - so worth doing (otherwise if someone decides to use 
it in the future, they might run into problems).


PS3, Line 148: 
> A reader of this code might not immediately understand why this class needs
I expanded the comment here.


PS3, Line 170: 
> Why is this set in Init()? Wouldn't it ideally be set it in the constructor
Moved to c'tor.


PS3, Line 175: proto_batch_idx_
> Just want to make sure that this will increase the shared_ptr refcount? It 
Yep - this was a mistake. Removed auto to make it more explicit.


PS3, Line 203: co
> Prefer a more descriptive name "rpc_completion_cb" or something similar.
Done


PS3, Line 214: ck_guard
> channel == nullptr
Done


PS3, Line 252: batch->tuple_data, &idx);
> Is this transferring the ownership to the RPC subsystem ? AddSideCar() inte
The ownership is shared with the batch object. AddSidecar() internally moves 
from the argument, which is a copy (i.e. its own reference).


PS3, Line 266: .release(), rpc_complete_callback);
> This is a subtle change in behavior from previous Impala version. In partic
Any reasonably conservative timeout runs the risk of false negatives if a 
sender is blocked.

I agree with your analysis about this being a change in behaviour. In practice, 
though, here's what I hope will happen: if one write to a node is slow enough 
to previously trigger the timeout, I would expect the statestore RPCs to also 
go slow (and they will time out); the node will be marked as offline and the 
query will be cancelled. 

If there is a situation where this RPC only is slow in writing (but all other 
RPCs to the server are ok), then I agree

[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#9).

Change subject: IMPALA-4669: [SECURITY] Add security library to build
..

IMPALA-4669: [SECURITY] Add security library to build

* Minor compilation fix
* Set toolchain version to include gflags 2.2.0 and glog 0.3.4-p2

Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/kudu/security/init.cc
M be/src/kudu/security/token_verifier.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M cmake_modules/FindKerberos.cmake
7 files changed, 58 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5717/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#5).

Change subject: IMPALA-4856: Port data stream service to KRPC
..

IMPALA-4856: Port data stream service to KRPC

This patch ports the data-flow parts of ImpalaInternalService to KRPC.

* ImpalaInternalService is split into two services. The first,
  ImpalaInternalService, deals with control messages for plan fragment
  instance execution, cancellation and reporting, and remains
  implemented in Thrift for now. The second, DataStreamService, handles
  large-payload RPCs for transmitting runtime filters and row batches
  between hosts.

* In the DataStreamService, all RPCs use 'native' protobuf. The
  DataStreamService starts on the port previously reserved for the
  StatestoreSubscriberService (which is also a KRPC service), to avoid
  having to configure another port when starting Impala. When the
  ImpalaInternalService is ported to KRPC, all services will run on one
  port.

* To support needing to address two different backend services, a data
  service port has been added to TBackendDescriptor.

* This patch adds support for asynchronous RPCs to the RpcMgr and Rpc
  classes. Previously, Impala used fixed size thread pools + synchronous
  RPCs to achieve some parallelism for 'broadcast' RPCs like filter
  propagation, or a dedicated per-sender+receiver pair thread on the
  sender side in the DataStreamSender case. In this patch, the
  PublishFilter() and TransmitData() RPCs are sent asynchronously using
  KRPC's thread pools.

* The TransmitData() protocol has changed to adapt to asynchronous
  RPCs. The full details are in data-stream-mgr.h.

* As a result, DataStreamSender no longer creates a
  thread-per-connection on the sender side.

* Both tuple transmission and runtime filter publication use sidecars to
  minimise the number of copies and serialization steps required.

* Also include a fix for KUDU-2011 that properly allows sidecars to be
  shared between KRPC and the RPC caller (fixing IMPALA-5093, a
  corruption bug).

* A large portion of this patch is the replacement of TRowBatch with its
  Protobuf equivalent, RowBatchPb. The replacement is a literal port of
  the data structure, and row-batch-test, row-batch-list-test and
  row-batch-serialize-benchmark continue to execute without major logic
  changes.

* Simplify FindRecvr() logic in DataStreamManager. No-longer need to
  handle blocking sender-side, so no need for complex promise-based
  machinery. Instead, all senders with no receiver are added to a
  per-receiver list, which is processed when the receiver arrives. If it
  does not arrive promptly, the DataStreamManager cleans them up after
  FLAGS_datastream_sender_timeout_ms.

* This patch also begins a clean-up of how ImpalaServer instances are
  created (by removing CreateImpalaServer), and clarifying the
  relationship between ExecEnv and ImpalaServer. ImpalaServer now
  follows the standard construct->Init()->Start()->Join() lifecycle that
  we use for other services.

* Ensure that all addresses used for KRPCs are fully resolved, avoiding
  the need to resolve them for each RPC.

TESTING
---

* New tests added to rpc-mgr-test.

TO DO
-

* Re-enable throughput and latency measurements per data-stream sender
  when that information is exposed from KRPC (KUDU-1738).

* TLS and Kerberos are still not supported by KRPC in this patch.

Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
---
M .clang-format
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/common/init.cc
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exprs/expr-test.cc
M be/src/kudu/rpc/rpc_sidecar.cc
M be/src/kudu/rpc/rpc_sidecar.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/rpc.h
M be/src/runtime/backend-client.h
M be/src/runtime/client-cache-types.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/scheduling/backend-config-test.cc
M be/src/scheduling/back

[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#10).

Change subject: IMPALA-4669: [SECURITY] Add security library to build
..

IMPALA-4669: [SECURITY] Add security library to build

* Minor compilation fix
* Set toolchain version to include gflags 2.2.0 and glog 0.3.4-p2
* Look for krb5 headers in toolchain (but dynamically link against libraries)

Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/kudu/security/init.cc
M be/src/kudu/security/token_verifier.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M cmake_modules/FindKerberos.cmake
7 files changed, 58 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5717/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5717
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#10).

Change subject: IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8
..

IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8

Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c
---
A be/src/kudu/rpc/CMakeLists.txt
A be/src/kudu/rpc/acceptor_pool.cc
A be/src/kudu/rpc/acceptor_pool.h
A be/src/kudu/rpc/blocking_ops.cc
A be/src/kudu/rpc/blocking_ops.h
A be/src/kudu/rpc/client_negotiation.cc
A be/src/kudu/rpc/client_negotiation.h
A be/src/kudu/rpc/connection.cc
A be/src/kudu/rpc/connection.h
A be/src/kudu/rpc/constants.cc
A be/src/kudu/rpc/constants.h
A be/src/kudu/rpc/exactly_once_rpc-test.cc
A be/src/kudu/rpc/inbound_call.cc
A be/src/kudu/rpc/inbound_call.h
A be/src/kudu/rpc/messenger.cc
A be/src/kudu/rpc/messenger.h
A be/src/kudu/rpc/mt-rpc-test.cc
A be/src/kudu/rpc/negotiation-test.cc
A be/src/kudu/rpc/negotiation.cc
A be/src/kudu/rpc/negotiation.h
A be/src/kudu/rpc/outbound_call.cc
A be/src/kudu/rpc/outbound_call.h
A be/src/kudu/rpc/protoc-gen-krpc.cc
A be/src/kudu/rpc/proxy.cc
A be/src/kudu/rpc/proxy.h
A be/src/kudu/rpc/reactor-test.cc
A be/src/kudu/rpc/reactor.cc
A be/src/kudu/rpc/reactor.h
A be/src/kudu/rpc/remote_method.cc
A be/src/kudu/rpc/remote_method.h
A be/src/kudu/rpc/remote_user.cc
A be/src/kudu/rpc/remote_user.h
A be/src/kudu/rpc/request_tracker-test.cc
A be/src/kudu/rpc/request_tracker.cc
A be/src/kudu/rpc/request_tracker.h
A be/src/kudu/rpc/response_callback.h
A be/src/kudu/rpc/result_tracker.cc
A be/src/kudu/rpc/result_tracker.h
A be/src/kudu/rpc/retriable_rpc.h
A be/src/kudu/rpc/rpc-bench.cc
A be/src/kudu/rpc/rpc-test-base.h
A be/src/kudu/rpc/rpc-test.cc
A be/src/kudu/rpc/rpc.cc
A be/src/kudu/rpc/rpc.h
A be/src/kudu/rpc/rpc_context.cc
A be/src/kudu/rpc/rpc_context.h
A be/src/kudu/rpc/rpc_controller.cc
A be/src/kudu/rpc/rpc_controller.h
A be/src/kudu/rpc/rpc_header.proto
A be/src/kudu/rpc/rpc_introspection.proto
A be/src/kudu/rpc/rpc_service.h
A be/src/kudu/rpc/rpc_sidecar.cc
A be/src/kudu/rpc/rpc_sidecar.h
A be/src/kudu/rpc/rpc_stub-test.cc
A be/src/kudu/rpc/rpcz_store.cc
A be/src/kudu/rpc/rpcz_store.h
A be/src/kudu/rpc/rtest.proto
A be/src/kudu/rpc/rtest_diff_package.proto
A be/src/kudu/rpc/sasl_common.cc
A be/src/kudu/rpc/sasl_common.h
A be/src/kudu/rpc/sasl_helper.cc
A be/src/kudu/rpc/sasl_helper.h
A be/src/kudu/rpc/serialization.cc
A be/src/kudu/rpc/serialization.h
A be/src/kudu/rpc/server_negotiation.cc
A be/src/kudu/rpc/server_negotiation.h
A be/src/kudu/rpc/service_if.cc
A be/src/kudu/rpc/service_if.h
A be/src/kudu/rpc/service_pool.cc
A be/src/kudu/rpc/service_pool.h
A be/src/kudu/rpc/service_queue-test.cc
A be/src/kudu/rpc/service_queue.cc
A be/src/kudu/rpc/service_queue.h
A be/src/kudu/rpc/transfer.cc
A be/src/kudu/rpc/transfer.h
A be/src/kudu/rpc/user_credentials.cc
A be/src/kudu/rpc/user_credentials.h
77 files changed, 20,264 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/5718/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-06-21 Thread Henry Robinson (Code Review)
Hello Impala Public Jenkins, Michael Ho, Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5715

to look at the new patch set (#19).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
M be/src/exec/kudu-util.h
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/env_posix.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/locks.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
D cmake_modules/FindOpenSSL.cmake
18 files changed, 140 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/19
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#14).

Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore 
services to KRPC
..

IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

This patch adds the core abstractions necessary for using Kudu RPC for
Impala's backend services. The StatestoreService and
StatestoreSubscriberService are both ported to KRPC to demonstrate how
the new RPC framework operates.

The main new class is RpcMgr, which is responsible for both services run
by a daemon and for obtaining clients to remote services. The new Rpc
class helps clients build rpc invocation objects, and use them to call
remote methods.

Also done:
 * Utility code to convert from a Thrift structure to a Protobuf
   wrapper. Thrift RPCs do not need to be ported to Protobuf to work.
 * Added more build support for Protobuf generation. All library targets
   now depend on the Protobuf generation step, just as they do Thrift.
 * thrift-server-test is rewritten to use Beeswax as a test service,
   since the old StatestoreService has been removed.
 * Remove InProcessStatestore that was only used for statestore-test and
   mini-impala-cluster. As far as we know, mini-impala-cluster is unused
   by anyone. (IMPALA-4784)

TESTING:
 * Impala's test suite passes.
 * statestore-test has been rewritten to use new statestore code. It
   includes all test cases from the now-removed Python test_statestore.py
 * rpc-mgr-test unit-tests the RpcMgr and Rpc classes.

TODO:
 * Enable SSL and Kerberos support (already in Kudu's imported
   libraries)

Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/catalog/CMakeLists.txt
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/codegen/CMakeLists.txt
M be/src/common/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/catalog-op-executor.cc
M be/src/exec/kudu-util.h
M be/src/experiments/CMakeLists.txt
M be/src/exprs/CMakeLists.txt
M be/src/rpc/CMakeLists.txt
A be/src/rpc/common.proto
A be/src/rpc/rpc-mgr-test.cc
A be/src/rpc/rpc-mgr.cc
A be/src/rpc/rpc-mgr.h
A be/src/rpc/rpc-mgr.inline.h
A be/src/rpc/rpc.h
A be/src/rpc/rpc_test.proto
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-util.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/bufferpool/CMakeLists.txt
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.cc
M be/src/statestore/CMakeLists.txt
D be/src/statestore/statestore-service-client-wrapper.h
D be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
A be/src/statestore/statestore.proto
M be/src/statestore/statestored-main.cc
M be/src/testutil/CMakeLists.txt
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
D be/src/testutil/mini-impala-cluster.cc
M be/src/transport/CMakeLists.txt
M be/src/udf/CMakeLists.txt
M be/src/udf_samples/CMakeLists.txt
M be/src/util/CMakeLists.txt
M be/src/util/collection-metrics.h
M be/src/util/counting-barrier.h
M bin/start-impala-cluster.py
M bin/start-impalad.sh
M common/thrift/CMakeLists.txt
M common/thrift/StatestoreService.thrift
D tests/statestore/test_statestore.py
53 files changed, 1,733 insertions(+), 1,219 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/5720/14
-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#4).

Change subject: IMPALA-4856: Port data stream service to KRPC
..

IMPALA-4856: Port data stream service to KRPC

This patch ports the data-flow parts of ImpalaInternalService to KRPC.

* ImpalaInternalService is split into two services. The first,
  ImpalaInternalService, deals with control messages for plan fragment
  instance execution, cancellation and reporting, and remains
  implemented in Thrift for now. The second, DataStreamService, handles
  large-payload RPCs for transmitting runtime filters and row batches
  between hosts.

* In the DataStreamService, all RPCs use 'native' protobuf. The
  DataStreamService starts on the port previously reserved for the
  StatestoreSubscriberService (which is also a KRPC service), to avoid
  having to configure another port when starting Impala. When the
  ImpalaInternalService is ported to KRPC, all services will run on one
  port.

* To support needing to address two different backend services, a data
  service port has been added to TBackendDescriptor.

* This patch adds support for asynchronous RPCs to the RpcMgr and Rpc
  classes. Previously, Impala used fixed size thread pools + synchronous
  RPCs to achieve some parallelism for 'broadcast' RPCs like filter
  propagation, or a dedicated per-sender+receiver pair thread on the
  sender side in the DataStreamSender case. In this patch, the
  PublishFilter() and TransmitData() RPCs are sent asynchronously using
  KRPC's thread pools.

* The TransmitData() protocol has changed to adapt to asynchronous
  RPCs. The full details are in data-stream-mgr.h.

* As a result, DataStreamSender no longer creates a
  thread-per-connection on the sender side.

* Both tuple transmission and runtime filter publication use sidecars to
  minimise the number of copies and serialization steps required.

* Also include a fix for KUDU-2011 that properly allows sidecars to be
  shared between KRPC and the RPC caller (fixing IMPALA-5093, a
  corruption bug).

* A large portion of this patch is the replacement of TRowBatch with its
  Protobuf equivalent, RowBatchPb. The replacement is a literal port of
  the data structure, and row-batch-test, row-batch-list-test and
  row-batch-serialize-benchmark continue to execute without major logic
  changes.

* Simplify FindRecvr() logic in DataStreamManager. No-longer need to
  handle blocking sender-side, so no need for complex promise-based
  machinery. Instead, all senders with no receiver are added to a
  per-receiver list, which is processed when the receiver arrives. If it
  does not arrive promptly, the DataStreamManager cleans them up after
  FLAGS_datastream_sender_timeout_ms.

* This patch also begins a clean-up of how ImpalaServer instances are
  created (by removing CreateImpalaServer), and clarifying the
  relationship between ExecEnv and ImpalaServer. ImpalaServer now
  follows the standard construct->Init()->Start()->Join() lifecycle that
  we use for other services.

* Ensure that all addresses used for KRPCs are fully resolved, avoiding
  the need to resolve them for each RPC.

TESTING
---

* New tests added to rpc-mgr-test.

TO DO
-

* Re-enable throughput and latency measurements per data-stream sender
  when that information is exposed from KRPC (KUDU-1738).

* TLS and Kerberos are still not supported by KRPC in this patch.

Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
---
M .clang-format
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/common/init.cc
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exprs/expr-test.cc
M be/src/kudu/rpc/rpc_sidecar.cc
M be/src/kudu/rpc/rpc_sidecar.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/rpc.h
M be/src/runtime/backend-client.h
M be/src/runtime/client-cache-types.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/scheduling/backend-config-test.cc
M be/src/scheduling/back

[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-21 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7257

Change subject: IMPALA-5547: Rework FK/PK join detection.
..

IMPALA-5547: Rework FK/PK join detection.

Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs

The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.

For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.

Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.

Performance testing:
1. Full TPC-DS run on 10TB:
   - Q10 improved by >100x
   - Q72 improved by >25x
   - Q17,Q26,Q29 improved by 2x
   - Q64 regressed by 10x
   - Total runtime: Improved by 2x
   - Geomean: Minor improvement
   The regression of Q64 is understood and we will try to address it
   in follow-on changes. The previous plan was better by accident and
   not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
   - No perf differences

Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.

Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
---
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
17 files changed, 1,500 insertions(+), 1,351 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-4703: reservation denial debug action

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4703: reservation denial debug action
..

IMPALA-4703: reservation denial debug action

Add debug action to deny reservation increases with some probability.
This allows us to test various scenarios, particularly:
* The case when the node only gets its initial reservation and must
  run to completion without increasing its reservation.
* The case when there is some memory pressure and the node sometimes
  gets a reservation increase and sometimes doesn't.

E.g. to deny all reservation requests after an ExecNode has opened:

  set debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0

This was applied to test_spilling. It caught a bug in the PAGG
with spilling string aggregations.

This required some minor extensions to the debug actions.
* Allow debug actions that apply to all ExecNodes if node_id is -1.
* Allow passing parameters to debug actions. The current grammar of the
  actions is not well-oriented towards extension, so I resorted to using
  @ as a new delimiter.

I also optimised ExecDebugAction() so that it is much faster in the
common case and extended --disable_mem_pools to prevent the buffer pool
from holding onto unused buffers.

Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
---
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/debug-options.cc
M be/src/runtime/debug-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M tests/query_test/test_spilling.py
13 files changed, 139 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7022/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4703: reservation denial debug action

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4703: reservation denial debug action
..


Patch Set 3:

(8 comments)

Rebased onto my latest buffer pool preview patch and addressed comments.

http://gerrit.cloudera.org:8080/#/c/7022/1/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS1, Line 452: T_
typo


http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 439:   LOG(INFO) << "Debug action on " << id_;
> is that too much noise, or did you find it useful?
Removed - missed removing this debugging code.


Line 459:   debug_action_param_.c_str(), debug_action_param_.size(), 
&parse_result);
> does "1.0" actually produce probability of exactly 1? i.e. is it guaranteed
You're right, there was an off-by-one bug that was hit if rand() returns 
exactly RAND_MAX


http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/bufferpool/buffer-pool.h
File be/src/runtime/bufferpool/buffer-pool.h:

Line 341:   /// Call SetDebugDenyIncreaseReservation() on this client's 
ReservationTracker.
> explain the parameters. even after reading the code, I'm not 100% sure what
I don't think it's actually necessary, so I removed it.


http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

Line 154: if (increase_deny_delay_ == 0) {
> is increase_deny_delay_ meant to get reset to an initial count at this poin
It was meant to be one-shot. I removed it because the purpose I had in mind no 
longer applied.


http://gerrit.cloudera.org:8080/#/c/7022/3/be/src/runtime/debug-options.h
File be/src/runtime/debug-options.h:

PS3, Line 55: // INVALID: debug options invalid
> this comment seems to be superseded by enable(), but action_param_ might be
This still seems relevant since enabled() depends on this. Or did it seem 
redundant?


PS3, Line 60: invalid
> this is a bit confusing given that INVALID is a phase. maybe explain instea
Done


http://gerrit.cloudera.org:8080/#/c/7022/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 70:   // A floating point number in range [0.0, 1.0] that will be the 
probability of denying
Wording.


-- 
To view, visit http://gerrit.cloudera.org:8080/7022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4703: reservation denial debug action

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4703: reservation denial debug action
..

IMPALA-4703: reservation denial debug action

Add debug action to deny reservation increases with some probability.
This allows us to test various scenarios, particularly:
* The case when the node only gets its initial reservation and must
  run to completion without increasing its reservation.
* The case when there is some memory pressure and the node sometimes
  gets a reservation increase and sometimes doesn't.

E.g. to deny all reservation requests after an ExecNode has opened:

  set debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0

This was applied to test_spilling. It caught a bug in the PAGG
with spilling string aggregations.

This required some minor extensions to the debug actions.
* Allow debug actions that apply to all ExecNodes if node_id is -1.
* Allow passing parameters to debug actions. The current grammar of the
  actions is not well-oriented towards extension, so I resorted to using
  @ as a new delimiter.

I also optimised ExecDebugAction() so that it is much faster in the
common case and extended --disable_mem_pools to prevent the buffer pool
from holding onto unused buffers.

Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
---
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/debug-options.cc
M be/src/runtime/debug-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M tests/query_test/test_spilling.py
13 files changed, 139 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7022/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied39bb091b12156e5dc61b528c6c0cd8de3fe657
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#21).

Change subject: PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool
..

PREVIEW: IMPALA-4674: Part 2: port backend exec to BufferPool

Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is claimed centrally (managed by
the InitialReservations class) and distributed to query operators
from there.

Port ExecNodes to use BufferPool:
  * Each ExecNode has to claim its reservation during Open()
  * Port Sorter to use BufferPool.
  * Switch from BufferedTupleStream to BufferedTupleStreamV2
  * Port HashTable to use BufferPool via a Suballocator.

This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)

The BlockingJoinNode code is also fixed so that resources are claimed
*after* the build side is opened in the case when there's an async
build thread. This is required for some bushy plans, e.g. TPC-H
Q 22.

TODO in this patch:
 * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size
 * Add maximum row size and minimum page size query options (dependent
  on planner changes).

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream

Testing:
* Updated tests to reflect new memory requirements
* TODO: recalibrate limits in test_mem_usage_scaling after planner
  min reservation changes

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/CMakeLists.txt
D be/src/runtime/buffered-block-mgr-test.cc
D be/src/runtime/buffered-block-mgr.cc
D be/src/runtime/buffered-block-mgr.h
D be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
D be/src/runtime/buffered-tuple-stream.cc
D be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/initial-reservations.cc
A be/src/runtime/initial-reservations.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/client-request-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.h
M be/src/util/static-asserts.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_

[Impala-ASF-CR] IMPALA-4674: Part 1: remove old aggs and joins

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4674: Part 1: remove old aggs and joins
..

IMPALA-4674: Part 1: remove old aggs and joins

This is intended to be merged at the same time as Part 2 but is
separated out to make the change more reviewable. Part 2 assumes
that it does not need special logic to handle this mode (e.g.
because the old aggs and joins don't use reservation).

Disable the --enable_partitioned_{aggregation,hash_join} options
and remove all product and test code associated with them.

Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
D be/src/exec/aggregation-node-ir.cc
D be/src/exec/aggregation-node.cc
D be/src/exec/aggregation-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/exec-node.cc
D be/src/exec/hash-join-node-ir.cc
D be/src/exec/hash-join-node.cc
D be/src/exec/hash-join-node.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/nested-loop-join-node.h
D be/src/exec/old-hash-table-ir.cc
D be/src/exec/old-hash-table-test.cc
D be/src/exec/old-hash-table.cc
D be/src/exec/old-hash-table.h
D be/src/exec/old-hash-table.inline.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
D testdata/workloads/functional-query/queries/QueryTest/legacy-joins-aggs.test
M tests/common/environ.py
M tests/common/skip.py
D tests/custom_cluster/test_legacy_joins_aggs.py
M tests/metadata/test_ddl.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_mt_dop.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_scanners.py
M tests/query_test/test_tpch_nested_queries.py
41 files changed, 31 insertions(+), 4,352 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/7102/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7254/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 275:   while (!output_iterator_.AtEnd()) {
> Should this be inside the if() statement? Since it references dummy_dst.
It doesn't matter too much so long as we can convince ourselves it's correct 
since this code is near EOL.


-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5553: Fix expr-test in release builds
..


Patch Set 1: Code-Review+2

Thanks for the cleanup

-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7254/1/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 275:   while (!output_iterator_.AtEnd()) {
Should this be inside the if() statement? Since it references dummy_dst.


-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5553: Fix expr-test in release builds

2017-06-21 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7255

Change subject: IMPALA-5553: Fix expr-test in release builds
..

IMPALA-5553: Fix expr-test in release builds

expr-test fails in release build as it uses
Literal::CreateLiteral() to create literal
expressions. Literal::CreateLiteral() wraps
ParseString() with DCHECK() so it becomes
a no-op in release builds.

This change fixes the issue by moving
Literal::CreateLiteral() from literal.cc to
expr-test.cc as it's only used by the test
code. Also replaces DCHECK() wrapped around
ParseString() with EXPECT_TRUE().

Verified the fix by building and running
a release build of expr-test.

Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
---
M be/src/exprs/expr-test.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
3 files changed, 114 insertions(+), 102 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7255/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id1257da350cb6cb69886e93f6615cdadd17c187d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-21 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7254

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..

IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

A recent change moved the initialization of output_tuple_desc_
to the constructor of AggregationNode. This changes the order
in which tuple_pool_ and output_tuple_desc_ are initialized.
The code in AggregationNode::Close() made the assumption that
tuple_pool_ cannot be NULL (although without a DCHECK) if
output_tuple_desc_ is not NULL. This no longer holds in the
new code.

This change makes AggregationNode::Close() checks tuple_pool_
to see if it's NULL before using it.

Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
---
M be/src/exec/aggregation-node.cc
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7254/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/777/

-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-21 Thread Juan Yu (Code Review)
Juan Yu has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG
Commit Message:

PS3, Line 15: a plan node
> MaxRowsProcessedVisitor detects when there are missing or corrupt stats and
Multi-column PK is not uncommon. Even with accurate stats, Planner could 
generate wrong plan, accidentally put small table on left side, and 
underestimate the join cardinality.
I maybe too conservative. My philosophy is codegen is good in general, we 
should disable it only when we are certain it will be slow. when we are not 
certain, keep it as before. That can reduce the risk of regression. Maybe do 
the same as the small query optimization, skip join for now until we can do 
better estimations.
Also some logging or indication in plan the codegen is disabled due to this 
optimization will help troubleshooting.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/776/

-- 
To view, visit http://gerrit.cloudera.org:8080/7197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor nits with HDFS parquet column readers

2017-06-21 Thread Zach Amsden (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7240

to look at the new patch set (#3).

Change subject: IMPALA-5548 Fix some minor nits with HDFS parquet column readers
..

IMPALA-5548 Fix some minor nits with HDFS parquet column readers

Replace some std::map instances with unordered maps.  std::map is
unnecessary in many cases where an unordered map should suffice, and
in almost all circumstances, perform better.  Also discovered was a
slightly dangerous initialization of an empty NullOffsetIndicator,
which could conceivably result in undefined behavior by writing
arr[ofs] |= b, where ofs was mistakenly initialized to -1 (and b to 0,
so such behavior may not be detected, but could cause a crash by
underrunning a buffer).  Also add warn unused to status results while
we are changing this.

Testing: Running exhaustive tests against a Jenkins build.

Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.h
M be/src/runtime/descriptors.h
5 files changed, 22 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7240/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5517: Allow IMPALA LOGS DIR to be overridden

2017-06-21 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5517: Allow IMPALA_LOGS_DIR to be overridden
..


Patch Set 3: Code-Review+2

Rebase only. Carrying +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/7197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4028813bd4f53815139225abd57845bb304ae3e4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5548 Fix some minor issues with HDFS / parquet column readers

2017-06-21 Thread Zach Amsden (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7240

to look at the new patch set (#2).

Change subject: IMPALA-5548 Fix some minor issues with HDFS / parquet column 
readers
..

IMPALA-5548 Fix some minor issues with HDFS / parquet column readers

Replace some std::map instances with unordered maps.  std::map is
unnecessary in many cases where an unordered map should suffice, and in
almost all circumstances, perform better.  Also discovered was a
slightly dangerous initialization of an empty NullOffsetIndicator, which
could conceivably result in undefined behavior by writing arr[ofs] |= b,
where ofs was mistakenly initialized to -1 (and b to 0, so such behavior
may not be detected, but could cause a crash by underrunning a buffer).
Also add warn unused to status results while we are changing this.

Testing: Running exhaustive tests against a Jenkins build.

Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.h
M be/src/runtime/descriptors.h
5 files changed, 22 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7240/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5548 Fix some minor issues with HDFS / parquet column readers

2017-06-21 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5548 Fix some minor issues with HDFS / parquet column 
readers
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7240/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 352:   FileFormatsMap per_type_files_;
> we iterate through this one. it doesn't look like the ordering has any inte
Done.  Not much point in putting this on HdfsFileFormat.


Line 464:   std::pair, int> 
FileTypeCountsMap;
> we iterate this one to print some info, and it seems like a consistent orde
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08e653cae6f2188599f4a23e4f44692166d9c119
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5540: Revert Sentry version back to 5.13
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5540: Revert Sentry version back to 5.13
..


IMPALA-5540: Revert Sentry version back to 5.13

Sentry has now fixed the problem on their end, so we can
return to using 5.13.

Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Reviewed-on: http://gerrit.cloudera.org:8080/7247
Reviewed-by: Henry Robinson 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Henry Robinson: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..


Patch Set 5:

(5 comments)

I found a handful of subtle bugs. Most of them were caught by running tests on 
my buffer pool development branch and hittings DCHECKs that too many 
reservations were claimed. The parallel plan bugs were found by inspection: 
resources required by the parallel plans should always be <= 2x the resources 
required by the distributed plans.

http://gerrit.cloudera.org:8080/#/c/7223/5/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

Line 156:   if (status->ok()) *status = AcquireResourcesForBuild(state);
I added AcquireResourcesForBuild() to defer consuming resources until the build 
side is open.


Line 169:   if (CanCloseBuildEarly() || !status->ok()) child(1)->Close(state);
I added this to close the build side ASAP when possible.


http://gerrit.cloudera.org:8080/#/c/7223/5/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

Line 170:   // Unless we are inside a subplan expecting to call 
Open()/GetNext() on the child
Moving this wasn't strictly necessary but it seems to make sense to close it 
once we're done with it.


http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

Line 103:   public boolean isBlockingNode() { return false; }
This was also a bug. It's a streaming node! Added a planner test with a 
pipeline of analytics that makes it obvious.


http://gerrit.cloudera.org:8080/#/c/7223/5/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

Line 239: if (sink_ instanceof JoinBuildSink) {
This was another subtle bug - we were double-counting some fragments in the 
parallel plans because the join build fragment was included in the parent 
fragment (there isn't an exchange node separating the fragments)/.


-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/6/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 183:   DCHECK(row_batch != nullptr);
Brief comment to explain this? It's subtle.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4862: make resource profile consistent with backend behaviour

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4862: make resource profile consistent with backend 
behaviour
..

IMPALA-4862: make resource profile consistent with backend behaviour

This moves away from the PipelinedPlanNodeSet approach of enumerating
sets of concurrently-executing nodes because unions would force
creating many overlapping sets of nodes. The new approach computes
the peak resources during Open() and the peak resources between Open()
and Close() (i.e. while calling GetNext()) bottom-up for each plan node
in a fragment. The fragment resources are then combined to produce the
query resources.

The basic assumptions for the new resource estimates are:
* resources are acquired in Open() and released in Close().
* Blocking nodes call Open() on their child before acquiring
  their own resources (this required some backend changes).
* Blocking nodes call Close() on their children before returning
  from Open()
* The peak resource consumption of the query is the sum of the
  independent fragments (except for the parallel join build plans
  where we can assume there will be synchronisation). This is
  conservative but we don't synchronise fragment Open() and Close()
  across exchanges so can't make stronger assumptions in general.

Also compute the sum of minimum reservations. This will be useful
in the backend to determine exactly when all of the initial
reservations have been claimed from a shared pool of initial reservations.

Testing:
* Updated planner tests to reflect behavioural changes.
* Added extra resource requirement planner tests for unions, subplans,
  pipelines of blocking operators, and bushy join plans.
* Added single-node plans to resource-requirements tests. These have
  more complex plan trees inside a single fragment, which is useful
  for testing the peak resource requirement logic.

Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/sort-node.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/ParallelPlanner.java
D fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
A fe/src/main/java/org/apache/impala/planner/PlanVisitor.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
38 files changed, 2,763 insertions(+), 384 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/7223/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I492cf5052bb27e4e335395e2a8f8a3b07248ec9d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2017-06-21 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7203/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS1, Line 371:  a
> as -> at
I did a second look at this while looking at Dan's latest comment. I think it 
should remain as "set as the query submission time" since this means that it 
represents the query submission time.
This would also remove the need to specify that it represents the same point in 
time as now_string


-- 
To view, visit http://gerrit.cloudera.org:8080/7203
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I969fc805922f2bb9c8101e84f85ff2cc3b1b6729
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-21 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7095

to look at the new patch set (#4).

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..

IMPALA-5431: Remove redundant path exists checks during table load

There are multiple places that do an exists() check on a path and then
perform some subsequent action on it. This pattern results in two
RPCs to the NN (one for the exists() check and one for the subsequent
action). We can avoid the exists() check in these cases since most HDFS
methods on paths throw a FileNotFoundException if the path does not
exist. This can save an RPC to NN and improve the metadata loading time.

Testing: Enough tests already cover this code path. This patch
passed core and exhaustive tests.

Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 59 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/775/

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 3: Code-Review+2

Rebased. Keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..


IMPALA-5497: spilling hash joins that output build rows hit OOM

The bug is that the join tried to bring the next spilled partition into
memory while still holding onto memory from the current partition.
The fix is to return earlier if the output batch is at capacity so
that resources are flushed.

Also reduce some of the redundancy in the loop that drives the spilling
logic and catch some dropped statuses..

Testing:
The failure was originally reproduced by my IMPALA-4703 patch. I was
able to cause a query failure with the current code by reducing the
memory limit for an existing query. Before it failed with up to 12MB of
memory. Now it succeeds with 8MB or less.

Ran exhaustive build.

Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Reviewed-on: http://gerrit.cloudera.org:8080/7180
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
6 files changed, 85 insertions(+), 85 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-21 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 3:

(7 comments)

nice! thanks for adding the test, I think it looks good. I had a few small 
additional comments, and I realized I think we can simplify disk-io-mgr by 
removing one of the members.

http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

Line 1142:   int num_io_threads_for_remote_disks = 
FLAGS_num_remote_hdfs_io_threads
const


PS3, Line 1160: 0
can you also set this to something else, e.g. 100 to show that it gets 
overwritten by the other parameters?


PS3, Line 1165: == 
weird indentation, just put the operator on the previous line and use 4 spaces 
to tab


http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

PS3, Line 631: ///
comment this is for testing only


PS3, Line 638:   ///  - threads_per_rotational_disk: number of read threads to 
create per rotational
 :   ///disk. This is also the max queue depth.
 :   ///  - threads_per_solid_state_disk: number of read threads to 
create per solid state
 :   ///disk. This is also the max queue depth.
these overwrite threads_per_disk


PS3, Line 837: num_threads_per_disk_
the naming of this should probably be num_io_threads_per_disk_ to be consistent

or better yet, I think we can remove this member altogether because we can just 
set num_io_threads_per_{rotational,ssd}_ to the value of 
FLAGS_num_threads_per_disk if the finer grained flags weren't set.


http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/util/thread.cc
File be/src/util/thread.cc:

PS3, Line 335: const
nit: space after const


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-06-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 886: prefix + 
"8InitZeroIN10impala_udf9BigIntValEEEvPNS2_15FunctionContextEPT_",
rather than defining a new builtin, why don't we wrap the sum() with a coalesce 
or isnull? does that not work for some reason?


http://gerrit.cloudera.org:8080/#/c/6812/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS5, Line 105:  The caller passes in a flag to the constructor that indicates 
if the count(*)
 :  * optimization can be applied to the query block of this scan. 
This scan node decides
 :  * whether to apply the optimization or not
This doesn't make a lot of sense to me. On one hand, it sayst he caller 
decides, but then it says that this node also decides. What's actually going on?


Line 109:  *
do we actually explain in some comment how this optimization works at a high 
level?


PS5, Line 140:  This
 :   // scan does additional analysis in init() to determine 
whether it is correct to apply
 :   // the optimization.
this seems to contradict the comment above that says the caller passes a flag 
to the constructor that indicates whether the optimization can be applied.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-21 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#3).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
6 files changed, 78 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-21 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7232/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

PS1, Line 637:   DiskIoMgr(int num_disks, int threads_per_disk, int 
min_buffer_size,
 :   int max_buffer_size);
 : 
 :   /// Create DiskIoMgr with default configs.
 :   DiskIoMgr();
 : 
 :   /// Clean up all threads and resour
> You can remove threads_per_disk and instead change the tests to pass the sa
Done


PS1, Line 637:   DiskIoMgr(int num_disks, int threads_per_disk, int 
min_buffer_size,
 :   int max_buffer_size);
 : 
 :   /// Create DiskIoMgr with default configs.
 :   DiskIoMgr();
 : 
 :   /// Clean up all threads and resour
> You can also change DiskQueue to know how many threads it has, and update D
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4868: Increase 
TestRequestPoolService.testUpdatingConfigs timeout
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4868: Increase 
TestRequestPoolService.testUpdatingConfigs timeout
..


IMPALA-4868: Increase TestRequestPoolService.testUpdatingConfigs timeout

A previous commit attempted to fix IMPALA-4868 by waiting
longer for a new file to become visible to the Impala code
responsible for parsing the admission control config file.
The timeout was observed in another gerrit-verify job, so we
should increase this timeout further. I don't see any
indication that there is anything else wrong with the test,
or that any other kind of failure occurred.

This change gives the test up to 20 seconds for the new file
to become visible and processed by the RequestPoolService.

Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c
Reviewed-on: http://gerrit.cloudera.org:8080/7118
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0b98d69fc3aa61a317944950d14eb93e1737250c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> I added a DCHECK that get hits immediately in the old buggy code with test_
Thanks Tim - you're quite right, and I misunderstood what you're saying. I 
think it's brittle to rely on the fact that move() may or may not result 
ultimately in a moved-from rvalue, but guess it's useful here with no obvious 
better implementation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to c0798a9

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to c0798a9
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to c0798a9

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Bump Kudu version to c0798a9
..


Bump Kudu version to c0798a9

Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f
Reviewed-on: http://gerrit.cloudera.org:8080/7246
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3523f2b769cff6ab3a6aac97ec36f6bb3bda5e0f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

2017-06-21 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4863: Correctly account the file type and compression 
codec
..


Patch Set 2:

Defaulting filtered to false in the function definition instead of passing 
false at every function invocation.

-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4863: Correctly account the file type and compression codec

2017-06-21 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4863: Correctly account the file type and compression 
codec
..

IMPALA-4863: Correctly account the file type and compression codec

If a scan range is filtered at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile

e.g. - File Formats: PARQUET/SNAPPY:364 PARQUET(Filtered):1460

Change-Id: I797916505f62e568f4159e07099481b8ff571da2
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
4 files changed, 21 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/7245/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7245
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#6).

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..

IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 703 insertions(+), 825 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> My understand was that move() didn't actually modify the object, it just pr
I added a DCHECK that get hits immediately in the old buggy code with 
test_cancellation.py on Kudu.

I removed RowBatchQueue::AddBatchWithTimeout() to avoid confusion and am now 
calling BlockingPutWithTimeout() on the queue directly.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3040 addendum: use specific build type timeout for slow builds

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3040 addendum: use specific_build_type_timeout for slow 
builds
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3040 addendum: use specific build type timeout for slow builds

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3040 addendum: use specific_build_type_timeout for slow 
builds
..


IMPALA-3040 addendum: use specific_build_type_timeout for slow builds

IMPALA-3040 was initially fixed to use a timeout with HDFS caching
tests, however some test executions against slow-running builds such as
ASAN indicate this timeout may not be high enough.

Use the specific_build_type_timeout() method to set a much higher
timeout for slower builds such as ASAN. This allows us to virtually
ignore timeout values on slow builds, but doesn't force us to
unconditionally increase the timeout in a release or debug build.

Testing:

Ran all tests that use get_num_cache_requests() in a loop 100 times each
under an ASAN build.  All test iterations passed.

Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2
Reviewed-on: http://gerrit.cloudera.org:8080/7115
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins
---
M tests/query_test/test_hdfs_caching.py
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

> (1 comment)

Thanks Tim. This example makes things clearer.

@Henry: The reason I said that using an rvalue ref is discouraged can be seen 
in Tim's example code. The caller knows exactly what to expect after calling 
pass_by_value(), i.e. the ownership is predictable.

However, after pass_by_rvalue_ref(), the caller does not know who owns the 'p' 
member anymore, unless through other means; like explicitly checking the 
unique_ptr value, or getting a bool back that says if it was used or not, etc.

-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> You could add an output parameter to BlockingPutWithTimeout() (rather than 
My understand was that move() didn't actually modify the object, it just 
produced an rvalue ref to the object. Instead the destruction of the source 
object would happen when the move constructor or assign operator is invoked. 
This test program seems to confirm this understanding:

#include 
#include 

using std::cout;
using std::move;
using std::unique_ptr;

void pass_by_rvalue_ref(unique_ptr&& ref, bool do_assign) {
  if (do_assign) {
unique_ptr tmp = move(ref);
  }
}
void pass_by_value(unique_ptrref) { }


int main() {
  unique_ptr p(new int);
  cout << "Initial: " << p.get() << "\n";
  pass_by_rvalue_ref(move(p), false);
  cout << "After pass by rvalue ref no assign: " << p.get() << "\n";
  pass_by_value(move(p));
  cout << "After pass by value: " << p.get() << "\n";

  p.reset(new int);
  cout << "Initial: " << p.get() << "\n";
  pass_by_rvalue_ref(move(p), true);
  cout << "After pass by rvalue ref assign: " << p.get() << "\n";
  return 0;
}

The output is:

tarmstrong@tarmstrong-box:~$ g++ -std=c++14 uniq.cc -o uniq && ./uniq
Initial: 0x19d2c20
After pass by rvalue ref no assign: 0x19d2c20
After pass by value: 0
Initial: 0x19d2c20
After pass by rvalue ref assign: 0


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> You could add an output parameter to BlockingPutWithTimeout() (rather than 
unique_ptr is already a move only class, which means it's limited to using move 
semantics and copying isn't allowed, something that rvalue refs try to provide 
for non-move-only classes/data types. That means that moving to an rvalue ref 
of a move only class is redundant, save for one subtle difference noted below. 
So the only difference between taking:

func(unique_ptr<> arg)

and

func(unique_ptr<>&& arg)

is that in the first case, the caller knows that after calling func, they're 
not supposed to use the unique_ptr member that was passed to arg.

In the second case, the value of the unique_ptr member passed is dependent on 
what func() does with arg. In the above code, we're relying on this behavior to 
conditionally relinquish or retain the ownership of row_batch.

Unless I got something wrong?


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> Just passing through.
You could add an output parameter to BlockingPutWithTimeout() (rather than 
change the return type). Or the row batch queue should be of shared_ptr<>. I 
don't think we should rely on when move() is called in the blocking queue (and 
I'm not sure that analysis is quite right, as you would need to move() the row 
batch into AddBatchWithTimeout() to make it an rvalue-ref; i.e. before any 
timeout).

Sailesh - Where did you see discouraging using unique_ptr<> as an rvalue-ref? 
The idea of rvalues is that you move() into them, which means you are 
explicitly relinquishing ownership of the moved-from object.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Dimitris Tsirogiannis (Code Review)
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7248

to look at the new patch set (#2).

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..

IMPALA-5549: Remove deprecated fields from CatalogService API

Remove from TCatalogUpdateResult the deprecated fields that are no longer
used and modify the catalog to always populate the
'updated_catalog_objects' and 'removed_catalog_object' fields with the
updated/removed catalog objects.

Testing: Run core tests.

Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
---
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 22 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/7248/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7248/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 2916: }
> May be add a Preconditions check on updatedPrivs.size() > 1 just so that it
Good point. Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 1: Code-Review+1

(1 comment)

Ok, Thanks for clarifying.

http://gerrit.cloudera.org:8080/#/c/7248/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 2916: }
May be add a Preconditions check on updatedPrivs.size() > 1 just so that it 
doesn't show up as an NPE in some edge case? (or) Probably surround the whole 
block with if (!updatedPrivs.isEmpty())


-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-06-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6527/5/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 179:   if 
(materialized_row_batches_->AddBatchWithTimeout(move(row_batch), 100)) {
> I think BlockingPutWithTimeout() actually already supports this fine since 
Just passing through.

I agree with Tim's approach, however, passing a unique_ptr as an rvalue-ref is 
usually discouraged since the lifetime of the member from the caller 
side('row_batch' here) completely depends on the implementation of the function 
it's being passed to, which may not be evident to the caller.

That being said, it's that ambiguity that helps us solve this problem and I 
don't see an easier way to do it, but leaving a comment about this would be 
good since this is so subtle, and would help from a readability perspective.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 1:

> IIRC, this was introduced because the permanent UDF changes broke
 > some clients (ex: BDR) relying on the old API that returned a
 > single updated/removed object. Is it not a problem anymore? Just
 > want to be sure that we don't break them again.

Since 5.12, BDR stopped using our catalog API. Hence, there is no need to keep 
these deprecated logic anymore.

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..


Patch Set 1:

IIRC, this was introduced because the permanent UDF changes broke some clients 
(ex: BDR) relying on the old API that returned a single updated/removed object. 
Is it not a problem anymore? Just want to be sure that we don't break them 
again.

-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..


Patch Set 4:

Sorry that was a link to the wrong commit, this is the branch with the correct 
one https://github.com/timarmstrong/incubator-impala/commits/fix-table-writers

-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5549: Remove deprecated fields from CatalogService API

2017-06-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7248

Change subject: IMPALA-5549: Remove deprecated fields from CatalogService API
..

IMPALA-5549: Remove deprecated fields from CatalogService API

Remove from TCatalogUpdateResult the deprecated fields that are no longer
used and modify the catalog to always populate the
'updated_catalog_objects' and 'removed_catalog_object' fields with the
updated/removed catalog objects.

Testing: Run core tests.

Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
---
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 28 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/7248/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc80e43392cdc85a841e670030e0988692bdf884
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7226/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 327: Status status = FinalizeCurrentPage();
This doesn't seem right - why return a Status if we're going to assume that 
it's always ok.

I actually had a branch where I was going through and fixing all the dropped 
statuses found by GCC7's [[nodiscard]]. My fixes for propagating the table 
writer status are here:

https://github.com/timarmstrong/incubator-impala/commit/6a9cb5a1d79c52001597b58eb073ff5b70a239f4

I already ran a build with these fixes. Maybe we can just fold that into here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5540: Revert Sentry version back to 5.13
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/774/

-- 
To view, visit http://gerrit.cloudera.org:8080/7247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5540: Revert Sentry version back to 5.13
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5540: Revert Sentry version back to 5.13

2017-06-21 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7247

Change subject: IMPALA-5540: Revert Sentry version back to 5.13
..

IMPALA-5540: Revert Sentry version back to 5.13

Sentry has now fixed the problem on their end, so we can
return to using 5.13.

Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/7247/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7247
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie4c29a69c90b0c5d06e17b46a837c880290f3b17
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..


Patch Set 4:

I fixed the placement of WARN_UNUSED_RESULT in the previous patchset, and also 
changed a method signature in the parquet writer so that the status of a 
compression operation could be checked. Please take another quick look at 
hdfs-parquet-table-writer.cc!

-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

2017-06-21 Thread Henry Robinson (Code Review)
Hello Impala Public Jenkins, Michael Ho, Sailesh Mukil, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7226

to look at the new patch set (#4).

Change subject: IMPALA-5532: Stack-allocate compressors in RowBatch 
(de)serialization
..

IMPALA-5532: Stack-allocate compressors in RowBatch (de)serialization

Change allocation pattern for Codec objects in RowBatch to be
stack-allocated. Make c'tors and Init() methods of codec implementations
publicly visible in order to do so.

Change signature of BaseColumnWriter::FinalizeCurrentPage() so that
result of ProcessBlock32() can be properly checked.

Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/experiments/compression-test.cc
M be/src/runtime/row-batch.cc
M be/src/util/codec.h
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/decompress.h
7 files changed, 119 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/7226/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4b5a8d2cc315db50e5d70b1191702206de3450d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 4:

Actually let's wait until https://gerrit.cloudera.org/#/c/7180 goes in then 
rebase on top of that. They should be independent but they're touching some of 
the same code so best to make sure they don't interfere with each other.

-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/773/

-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7180/6/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 238:   || JoinOp == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN) {
> the old formatting seemed more readable, but up to you.
Done


http://gerrit.cloudera.org:8080/#/c/7180/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 609: if (!output_build_partitions_.empty()) continue;
> the control flow in this loop is still pretty hard to read, but not for thi
Agreed. I think it really needs to be completely restructured.


-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5497: spilling hash joins that output build rows hit OOM

2017-06-21 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7180

to look at the new patch set (#7).

Change subject: IMPALA-5497: spilling hash joins that output build rows hit OOM
..

IMPALA-5497: spilling hash joins that output build rows hit OOM

The bug is that the join tried to bring the next spilled partition into
memory while still holding onto memory from the current partition.
The fix is to return earlier if the output batch is at capacity so
that resources are flushed.

Also reduce some of the redundancy in the loop that drives the spilling
logic and catch some dropped statuses..

Testing:
The failure was originally reproduced by my IMPALA-4703 patch. I was
able to cause a query failure with the current code by reducing the
memory limit for an existing query. Before it failed with up to 12MB of
memory. Now it succeeds with 8MB or less.

Ran exhaustive build.

Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
6 files changed, 85 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/7180/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I075388d348499c5692d044ac1bc38dd8dd0b10c7
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


  1   2   >