[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6/testdata/bad_parquet_data/README
File testdata/bad_parquet_data/README:

PS6, Line 16: out-of-range-timestamp.parq
The parquet file is no longer in this folder, so I will remove this comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> I'm assuming you are measuring response time. Since there is overall more w
Right, but I'm measuring the absolute difference in response time, not  the 
relative difference. So it doesn't totally matter what the rest of the query is 
doing, as long as it's stable (I agree that variance can drown out the signal). 
Thankfully all the runtimes I measured were remarkably stable. However, your 
next point is a very good one.


Line 23:  * No performance difference measured by introduction of extra
> I'm assuming you compared response times. With multi-threaded scans the los
Doh, of course you're right, and I wasn't properly measuring the overhead, 
because of parallelism.

With a single thread, decoding 1B decimals is about 1 second slower, which is 
about 1ns per decode. I've updated the commit message (still think that's not a 
perf difference worth chasing).


http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

Line 203:   /// assumed to be BYTE_ARRAY, otherwise it is assumed to be 
FIXED_LEN_BYTE_ARRAY.
> Document return value?
Done


PS1, Line 328: fixed_len_size
> This name seems inaccurate since it can be variable. Maybe 'encoded_size'?
Hm, I think it's accurate - either it's set (> 0) and fixed, or it's not set 
(-1). 'encoded_size' wouldn't work well later in the method (because it's set 
to the size of the buffer, not the total encoded size). I could of course have 
another variable to manage that, but this seems ok to me.


PS1, Line 338: fixed_len_size
> What if fixed_len_size is negative?
Done.


http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 150:   # Decimal Parquet encodings written by Java Parquet library.
> If you're going to create the table as part of the test, might as well copy
Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#2).

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

 * Extend metadata checks to allow more than one possible physical type
   for a given logical type.
 * Change decimal decoding to handle non-fixed-length format in same path
   as fixed-length encoding.

Testing:

 * Query test that decodes dictionary-encoded decimals using binary
   encoding.

Perf:

 * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
   decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding.
 * The overhead of decoding with the extra branch was measured at 1s;
   i.e. the per-decode overhead is 1ns.

Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
A testdata/data/byte_array_decimal_dict_encoded.parquet
A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test
M tests/query_test/test_scanners.py
6 files changed, 119 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.

2016-11-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan 
generation.
..


Patch Set 1:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/expr-context.cc
File be/src/exprs/expr-context.cc:

Line 155:   void* value = GetValue(NULL);
how does this handle/bail out of cases where exprctx is accidentally not a 
constant expr?


http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1541: // The rewrite rule for extracting conjuncts simplifies these to 
NULL.
rather than special-casing it here, it would make more sense to special-case 
GetValue() so you don't need to compare a type.


http://gerrit.cloudera.org:8080/#/c/5109/1/be/src/exprs/literal.h
File be/src/exprs/literal.h:

Line 26: #include "runtime/string-value.h"
duplicated


http://gerrit.cloudera.org:8080/#/c/5109/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

Line 41:   TIMESTAMP_LITERAL
please group all the literals together, so we can easily see what's still 
missing.


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 242:   fnName = path.get(path.size() - 1);
why is this necessary?


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
File fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java:

Line 204:   if (Double.isNaN(val.double_val) || 
Double.isInfinite(val.double_val)) return null;
nice catch!

long line


Line 226:   for (byte b: bytes) {
single line


Line 233: result = new StringLiteral(strVal, constExpr.getType(), 
false);
did you add a new test that blows up with the previous default unescaping?

also, this retains the type of the original expr, which seems like the right 
thing to do (and what we previously did was the wrong thing to do). does this 
warrant another test?


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
File fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java:

Line 34:  * produce, so it better to defer to a single source of truth (the BE 
implementation).
it -> it's


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 164: if (analyzer.getQueryOptions().enable_expr_rewrites && rewriter 
!= null) {
pass in null for the rewriter if it's disabled, so you don't have to check that 
as well.

move the null check into rewriteExprs() so you don't have to check that either.


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

Line 523:   // constant into nested-loop joins.
refer to cross products, to avoid confusion with future index nlj.


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 373
do we still need this function for anything?

there's a call in the partition pruner, but that should also go.


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 434:   public void initNoRewrite(Analyzer analyzer) throws ImpalaException 
{
why not just init(Analyzer)? that would also make it clear that you're not 
applying transformation rules


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 577:   emptySetNode.init(ctx_.getRewriter(), analyzer);
instead of passing this in, maybe each PlanNode should have access to the ctx_


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 93:   public void rewriteList(List exprs, Analyzer analyzer) throws 
AnalysisException {
why rewrite as opposed to apply here?


http://gerrit.cloudera.org:8080/#/c/5109/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 204:* Only contains very basic tests for a few interesting cases. A 
more thorough
". More thorough"


http://gerrit.cloudera.org:8080/#/c/5109/1/testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test:

Line 92:predicates: a.int_col = 0, a.times

[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3812: Fix error message for unsupported types
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3812: Fix error message for unsupported types
..


IMPALA-3812: Fix error message for unsupported types

Before this patch an unclear error message was returned if DATE or
DATETIME appeared in the select list after a star expansion. This was
because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE.
This is fixed by serializing correctly.

Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Reviewed-on: http://gerrit.cloudera.org:8080/4859
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/UnsupportedTypes/data.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/misc.test
6 files changed, 35 insertions(+), 12 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 445:   /// Writes the next value into the next slot in the *tuple using 
pool if necessary.
> ... into the destination slot in tuple ...
Done


Line 480: } else if (UNLIKELY(NeedsConversion() &&
> Is it possible that an otherwise invalid value becomes valid after conversi
No, because the input to conversion must be a valid date.


http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 512:   /// the next slot in *tuple.
> ... into the appropriate destination slot in 'tuple'.
Done


http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test:

Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp;
> setting up the table is already done in the .py file, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4363: Add Parquet timestamp validation
..

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we did't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the timestamp falls into 1400.. date
range as we are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/data/out_of_range_timestamp.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M tests/query_test/test_scanners.py
9 files changed, 133 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

The join build sink patches refactored the DataSink interface and
inadvertently removed this counter from the profile.
The problem was that the sink MemTracker was not initialized with the
sink's profile.

The fix is for the sink to create the MemTracker itself.

Testing:
Ran core tests. Manually checked profile to make sure the counter
appeared in HdfsTableSink, DataStreamSender, etc.

Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Reviewed-on: http://gerrit.cloudera.org:8080/4969
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/row-batch-cache.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/plan-fragment-executor.h
19 files changed, 64 insertions(+), 74 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

Adds support in the shell to report the number of modified
rows for all DML operations, as well as the number of rows
with errors.

Testing: Added shell tests.

Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Reviewed-on: http://gerrit.cloudera.org:8080/5103
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M be/src/service/impala-beeswax-server.cc
M common/thrift/ImpalaService.thrift
M shell/impala_client.py
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
5 files changed, 102 insertions(+), 25 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> I'd just be picking an arbitrary threshold in that case too, since I don't 
I agree that it's more actionable if it's based on number of tuples.

However, isn't the longer compilation + optimization time with too many tuples 
a direct result of having large functions ? If so, instruction count seems to 
be a more robust estimate. For example, the generated functions can be 
different if we change the codegen function. In that case this threshold may 
not hold anymore.

This threshold really shouldn't be arbitrary. The ideal case is that the 
planner can estimate a threshold of instruction count in which the cost of 
codegen is less than interpretation.

Also, it seems that the planner should have all the information needed to make 
the call. Is there some unexpected difficulty for doing it in the planner ?


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) {
> I disable it for this particular exchange. This is a proactive thing only r
This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen(), 
we only codegen when is_mergeing_ is true. This is pretty much the same 
situation as the Agg node here. So, it would be nice to follow the same pattern 
for exchange node and convert if (is_merging_) to DCHECK(is_merging_);


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..


IMPALA-4466: Improve Kudu CRUD test coverage

The results in the test files were verified by hand.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table. For now, this section cannot be used in a
test along with the RESULTS or ERRORS sections.

TODO: Refactor the DML test case handling (IMPALA-4471)

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Reviewed-on: http://gerrit.cloudera.org:8080/4953
Reviewed-by: Matthew Jacobs 
Tested-by: Internal Jenkins
---
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
D testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/common/impala_test_suite.py
M tests/common/test_result_verifier.py
M tests/query_test/test_kudu.py
M tests/util/test_file_parser.py
10 files changed, 1,624 insertions(+), 487 deletions(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..


Patch Set 12: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 161:   if (state->CodegenDisabled()) {
> Shouldn't this be COdegenDisabledByQueryOption() to match the message?
I refactored this into a function similar to what you did in the other patch 
and updated the profile string based on the condition.


http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS2, Line 113: Interanl
> Internal
Done


PS2, Line 132: too_many_args
> too_many_args_to_interpret?
Done


PS2, Line 156: AddUdfToCodegen
> Udf seems like a bit of a misnomer, since it may be a builtin. Maybe "AddSc
Done


http://gerrit.cloudera.org:8080/#/c/5105/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS2, Line 318: disable_codegen
> It would be good if we had a different name to distinguish this from the ot
Renamed to disable_codegen_internal.


http://gerrit.cloudera.org:8080/#/c/5105/2/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

PS2, Line 42: exec_single_node_option=[100]
> It seems like we'd want to test with the single node option both enabled an
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..

IMPALA-4432: Handle internal codegen disabling properly

There are some conditions in which codegen is disabled internally
even if it's enabled in the query option. For instance, the single
node optimization or the expression evaluation requests sent from
the FE to the BE. These internal disabling of codegen are advisory
as their purposes are to reduce the latency for tables with no or
very few rows. The internal disabling of codegen doesn't interact
well with UDFs which cannot be interpreted (e.g. IR UDF) as it
conflates with the 'disable_codegen' query option set by the user.
As a result, it's hard to differentiate between when codegen is
disabled explicitly by users and when it is disabled internally.

This change fixes the problem above by adding an explicit flag in
TQueryCtx to indicate that codegen is disabled internally. This flag
is only advisory. For cases in which codegen is needed to function,
this internal flag is ignored and if codegen is disabled via query
option, an error is thrown. For this new flag to work with ScalarFnCall,
codegen needs to happen after ScalarFnCall::Prepare() because it's
hard to tell if a fragment contains any UDF that cannot be interpreted
until after ScalarFnCall::Prepare() is called. However, Prepare() needs
the codegen object to codegen so it needs to be created before Prepare().
We can either always create the codegen module or defer codegen to a point
after ScalarFnCall::Prepare(). The former has the downside of introducing
unnecessary latency for say single-node optimization so the latter is
implemented. It is needed as part of IMPALA-4192 any way.

After this change, ScalarFnCall expressions which need to be codegen'd
are inserted into a vector in RuntimeState in ScalarFnCall::Prepare().
Later in the codegen phase, these expressions' GetCodegendComputeFn()
will be called after codegen for operators is done. If any of these
expressions are already codegen'd indirectly by the operators,
GetCodegendComputeFn() will be a no-op. This preserves the behavior
that ScalarFnCall will always be codegen'd even if the fragment
doesn't contain any codegen enabled operators.

Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
---
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/fe-support.cc
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M tests/query_test/test_udfs.py
24 files changed, 167 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5105/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Improve message output from run-step.sh

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Improve message output from run-step.sh
..


Patch Set 2: Code-Review+2

This was indeed annoying! Thanks for fixing it.

-- 
To view, visit http://gerrit.cloudera.org:8080/5116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


IMPALA-4493: fix string-compare-test when using clang

Only the 0 value or sign bit is specified in the return
value for strncmp(), so fix up the test accordingly.

Testing:
- verified the new test still reproduces IMPALA-4436
- verify the new test passes under ASAN build

Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Reviewed-on: http://gerrit.cloudera.org:8080/5110
Reviewed-by: Jim Apple 
Tested-by: Internal Jenkins
---
M be/src/runtime/string-compare-test.cc
1 file changed, 9 insertions(+), 2 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..

IMPALA-4490: Only generate runtime filters for hash join nodes.

Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
2 files changed, 34 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5117/2/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test:

PS2, Line 1405: # IMPALA-3809
> Mistyped Jira.
Oops, too many at once. Fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5117/2/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test:

PS2, Line 1405: # IMPALA-3809
Mistyped Jira.


-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..

IMPALA-4490: Only generate runtime filters for hash join nodes.

Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
2 files changed, 34 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5117/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test:

PS1, Line 1405: # IMPALA-3809
> Can you extend this easily to have a HJ node below the NLJ that does create
Sure. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,465 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 7:

PS7 is a rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5117/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test:

PS1, Line 1405: # IMPALA-3809
Can you extend this easily to have a HJ node below the NLJ that does create a 
filter? This would confirm that filters are still created (otherwise if they're 
somehow disabled, this test still passes).

No need if it's too noisy.


-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,465 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 952: RETURN_IF_ERROR(OptimizeModule());
> Is there a reason why we don't call DestroyIRModule() when OptimizeModule()
No reason, it's inconsistent. I don't think it's really necessary to destroy 
the IR module on the error path, since it will be cleaned up shortly anyway, so 
I removed it on those paths to simplify things.


PS4, Line 986:  memory_manager_->bytes_allocated()),
 : memory_manager_->bytes_allocated()
> The code may be easier to read if this factored into a single variable inst
Done


Line 1039:   if (!mem_tracker_->TryConsume(estimated_memory)) {
> I wonder if it's better to skip optimization if we cannot reserve the memor
Compiling the code can also be pretty CPU/memory intensive so it probably isn't 
even safe to do that. I think the codegen memory will only be greater than the 
runtime memory requirement in some pretty extreme cases. E.g. in the queries I 
looked at the memory calculation was at most a few MB per fragment, and usually 
less.


PS4, Line 1041: Substitute("Codegen failed to reserve '$0' bytes for 
optimization",
  :   estimated_memory),
  : estimated_memory);
> These line wraps look a bit weird to me but not sure if it's a side effect 
I agree they look weird but clang-format really wants to format it like this.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 708: optimiser
> nit: optimizer
We generally use a mix of American and commonwealth English spellings in 
comments. I guess here we're inconsistent internally so I just changed it.


PS4, Line 709:  512 bytes
> Mind documenting how this is derived so we can update it accordingly in cas
Elaborated a bit. It's very approximate (and will become very inaccurate once 
functions get large).


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/mcjit-mem-mgr.h
File be/src/codegen/mcjit-mem-mgr.h:

PS4, Line 36: memory
> This is for tracking memory consumed by the compiled machine code, right ?
Reworded to clarify.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS4, Line 187:  false
> true ?
Done


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS4, Line 762: if (ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) {
 :   // Avoid bloating function by inlining too many exprs into 
it.
 :   expr_fn->addFnAttr(llvm::Attribute::NoInline);
 : }
> I wonder if this can be more generalized by putting it in LlvmCodegen::Fina
Not a bad idea, but doesn't address the main scenario this patch helps with, 
which is when many small functions are inlined into a large function.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> This seems a bit adhoc to me. May be better to track the size of IR functio
I'd just be picking an arbitrary threshold in that case too, since I don't 
really have a good way to accurately estimate the cost. The advantage of doing 
it this way is that it's easier to understand why it was disabled and gives a 
user an actionable way to work around it.


PS4, Line 190:  const
> constexpr
There isn't a semantic difference between static const and static constexpr 
when it's an integer literal. 
http://en.cppreference.com/w/cpp/language/constant_expression


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS4, Line 1911: 
  : 
> Is it necessary to remove it ? For comparison, please see FirstValUpdate().
I added it back in. It shouldn't be necessary to instantiate it, but this makes 
it more consistent.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS4, Line 311: CODEGEN_INLINE_EXPRS_THRESHOLD
> Wouldn't instruction count be a more precise estimation ?
It's a pretty arbitrary threshold anyway (afaik optimisation time is only 
loosely correlated with instruction count) so I like that this is a little 
simpler and easier to understand.


PS4, Line 311: const
> constexpr ?
I don't feel strongly but I don't think there's any advantage to constexpr when 
it's a simple literal.


http://gerrit.cloudera.org:8080/#/c/4956/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 494:   24: required bool disable_codegen
> please move up, into the common/non-union part of this struct (field 8?) a

[Impala-ASF-CR] Improve message output from run-step.sh

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#2).

Change subject: Improve message output from run-step.sh
..

Improve message output from run-step.sh

run-step prints a message to tell the reader what it's doing. However,
that message wasn't flushed so that run-step could print OK or FAILED on
the same line. The result was that long-running steps wouldn't print
anything to the log until they were done, at least in Jenkins contexts.

This patch changes it so that the message is flushed, and then the
result is printed on a separate line (including the time it took to run
the step).

  $ run-step "Hello world!" helloworld.out sleep 5
  Hello world! (logging to /tmp/helloworld.out)...
  OK (Took: 0 min 5 sec)

Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
---
M testdata/bin/run-step.sh
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5116/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> Can do - but where would that extra slowness really come from? I would have
I'm assuming you are measuring response time. Since there is overall more work 
for the scanner to do in your dict-encoded   experiment, any difference in perf 
will be less pronounced because it affects a relatively smaller portion of the 
work. With plain encoded there is no "overhead" of decoding the dictionary 
indexes and fetching the values from the dictionary. For a single decimal 
column, the work of decoding the dict indexes and fetching their values should 
be in the same ball park as just populating the slot directly with plain 
encoding, so there is roughly 50% "noise" it seems.


Line 23:  * No performance difference measured by introduction of extra
> No, but I can do. What do you expect to change?
I'm assuming you compared response times. With multi-threaded scans the loss in 
perf might not be apparent.

With mt_dop=1 we're running the whole query in a single thread, so any slowdown 
along that critical path should prominently affect response time.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,458 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

Line 203:   /// assumed to be BYTE_ARRAY, otherwise it is assumed to be 
FIXED_LEN_BYTE_ARRAY.
Document return value?


PS1, Line 328: fixed_len_size
This name seems inaccurate since it can be variable. Maybe 'encoded_size'?


PS1, Line 338: fixed_len_size
What if fixed_len_size is negative?


http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 150:   # Decimal Parquet encodings written by Java Parquet library.
If you're going to create the table as part of the test, might as well copy the 
data as part of the test (and avoid the need to reload test data). E.g. like 
https://github.com/apache/incubator-impala/blob/master/tests/query_test/test_scanners.py#L248


http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test
File 
testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test:

PS1, Line 1: 
   :  QUERY
   : select col2 from decimal_encodings;
   :  RESULTS
   : 123.00
   : 123.00
   : 123.00
   : 123.00
   : 123.00
   :  TYPES
   : DECIMAL
> We're working on getting some richer test data - it's just a bit of a pain 
Sounds good, I think we need a bit more, especially boundary cases for 
different decimal sizes.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5117

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..

IMPALA-4490: Only generate runtime filters for hash join nodes.

Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
2 files changed, 24 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/5117/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] Improve message output from run-step.sh

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5116

Change subject: Improve message output from run-step.sh
..

Improve message output from run-step.sh

run-step prints a message to tell the reader what it's doing. However,
that message wasn't flushed so that run-step could print OK or FAILED on
the same line. The result was that long-running steps wouldn't print
anything to the log until they were done, at least in Jenkins contexts.

This patch changes it so that the message is flushed, and then the
result is printed on a separate line (including the time it took to run
the step).

  $ run-step "Hello world!" helloworld.out sleep 5
  Hello world! (logging to /tmp/helloworld.out)...
  OK (Took: 0 min 5 sec)

Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
---
M testdata/bin/run-step.sh
1 file changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5116/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> Can also verify on plain encoding? Seems like that would be the extreme cas
Can do - but where would that extra slowness really come from? I would have 
thought decoding 1 billion values would represent a large portion of the 
execution time of such a simple query, and if the proportion is really that 
small I'm inclined to worry even less about the perf differential.


Line 23:  * No performance difference measured by introduction of extra
> Did you run this with num_scanner_threads=1 or even better mt_dop=1?
No, but I can do. What do you expect to change?


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 23:  * No performance difference measured by introduction of extra
Did you run this with num_scanner_threads=1 or even better mt_dop=1?


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
Can also verify on plain encoding? Seems like that would be the extreme case. 
The perf difference may not be pronounced enough with dict-encoding due to the 
general slowness of a dictionary-encoded column with only distinct values.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


Patch Set 2:

> Thanks!Do you think I should add some code to catalog to validate the
> value of 'initial_hms_cnxn_timeout_s'? What would be the acceptable
> range?

You could validate that it was > 0, but otherwise I don't think you should put 
a bound on it. Users may want to wait effectively forever, and I can see them 
setting 99 as a value to simulate that.

> Adding this config parameter to the CM UI is another issue. Do you think it 
> should  be added for the 5.10 release or it is not that urgent? I guess, the 
> default value is reasonable for most users and they can always set it as a 
> safety valve if they have to.

Sounds like you're talking about a Cloudera release. Reminder that this is the 
Apache Impala project - vendor specific concerns should be discussed elsewhere.

-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 157: if (LIKELY(dst->ptr && src.ptr)) memcpy(dst->ptr, src.ptr, 
src.len);
Prefer explicit NULL checks.

I also don't believe this change is necessary and could mask other bugs. If 
src.ptr is NULL, that means src.len is 0 (unless there's a bug), which means 
that dst->ptr should have been set to NULL by the StringVal constructor.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 1684: accumulators::count(completion_times)
> accumulators::count(completion_times) can be zero, making mean and variance
Can you do an explicit comparison against 0 here?


Line 1701:   if (accumulators::count(rates)) {
And here


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/row-batch-serialize-test.cc
File be/src/runtime/row-batch-serialize-test.cc:

PS1, Line 164: len
> int buf[x] where x is a 0-length variable at run-time is UB
How about using a vector or string here instead to avoid the extra branch.


http://gerrit.cloudera.org:8080/#/c/5082/1/bin/run-backend-tests.sh
File bin/run-backend-tests.sh:

Line 41: export 
PATH="${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/bin:${PATH}"
> http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#stack-traces-and
It's a bit unfortunately there isn't a cleaner way to do this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5115/1/testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test
File 
testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test:

PS1, Line 1: 
   :  QUERY
   : select col2 from decimal_encodings;
   :  RESULTS
   : 123.00
   : 123.00
   : 123.00
   : 123.00
   : 123.00
   :  TYPES
   : DECIMAL
We're working on getting some richer test data - it's just a bit of a pain to 
write it out. This will include > 1 dict-encoded value, and some values that 
are plain encoded.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-16 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5115

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

* Extend metadata checks to allow more than one possible physical type
  for a given logical type.
* Change decimal decoding to handle non-fixed-length format in same path
  as fixed-length encoding.

Testing:

 * Query test that decodes dictionary-encoded decimals using binary
   encoding.

Perf:

 * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
   decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding.
 * No performance difference measured by introduction of extra
   predictable branch to Decode() path.

Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M testdata/bin/create-load-data.sh
A testdata/data/byte_array_decimal_dict_encoded.parquet
A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test
M tests/query_test/test_scanners.py
7 files changed, 118 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 


[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4470: Avoid creating a NumericLiteral from 
NaN/infinity/-0.
..


IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.

Our NumericLiteral is backed by a BigDecimal which cannot
represent the special float values NaN, infinity or negative zero.
As a result, when evaluating constant expressions from the FE we
hit an exception when trying to create a NumericLiteral from
a NaN or infinity value. Before, negative zero would silently
get converted to zero which is dangerous.

The fix is to treat the expr evaluation as a failure and not
replace the constant Expr with a LiteralExpr.

Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230
Reviewed-on: http://gerrit.cloudera.org:8080/5050
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
3 files changed, 32 insertions(+), 3 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4470: Avoid creating a NumericLiteral from 
NaN/infinity/-0.
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..

IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

Adds support in the shell to report the number of modified
rows for all DML operations, as well as the number of rows
with errors.

Testing: Added shell tests.

Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
---
M be/src/service/impala-beeswax-server.cc
M common/thrift/ImpalaService.thrift
M shell/impala_client.py
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
5 files changed, 102 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5103/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5103/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 260:   // Number of row operations attempted but not modified due to 
errors reported by the
> Suggest some minor rephrasing:
Done


http://gerrit.cloudera.org:8080/#/c/5103/2/shell/impala_shell.py
File shell/impala_shell.py:

Line 928: error_report = ", %d row error(s)" % (num_row_errors)
> Don't we need to check somewhere whether num_row_errors was set in the resp
Yes, this wasn't handled yet, thx. I added a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3812: Fix error message for unsupported types
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 5:

(4 comments)

Almost ready for +1

http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 445:   /// Writes the next value into the next slot in the *tuple using 
pool if necessary.
... into the destination slot in tuple ...

(it's not the "next" slot, it is always the same slot for a particular column 
reader)


Line 480: } else if (UNLIKELY(NeedsConversion() &&
Is it possible that an otherwise invalid value becomes valid after conversion?


http://gerrit.cloudera.org:8080/#/c/4968/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 512:   /// the next slot in *tuple.
... into the appropriate destination slot in 'tuple'.

(it's not the "next" slot, it is always the same slot for a particular column 
reader)


http://gerrit.cloudera.org:8080/#/c/4968/5/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test:

Line 5: DROP TABLE IF EXISTS invalid_parquet_timestamp;
setting up the table is already done in the .py file, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution

2016-11-16 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-3342: Add thread counters to monitor plan fragment 
execution
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

PS8, Line 42: #include "runtime/runtime-state.
> what do you need this for?
This came from an older patch. Removed.


http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

PS8, Line 137: rage_wait_timer_ = 
> Does it work if you pass an empty string? I think "PlanFragmentThreads" is 
Done


http://gerrit.cloudera.org:8080/#/c/4633/8/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS8, Line 323: total_thread_statistics
> how about calling this total_thread_statistics_ or similar?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3342: Add thread counters to monitor plan fragment execution

2016-11-16 Thread anujphadke (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4633

to look at the new patch set (#9).

Change subject: IMPALA-3342: Add thread counters to monitor plan fragment 
execution
..

IMPALA-3342: Add thread counters to monitor plan fragment execution

This change removes the use of total_cpu_timer which incorrectly
monitors the CPU time. Adding THREAD_COUNTERS to measure the user
and sys time in plan fragment execution. This also accounts for the
time spent in the hdfs/kudu scanner and in a blocking join.

Snippet of a query plan with the newly added PlanFragment
THREAD_COUNTERS:
  Instance 2b40b101e2626e7a:a3d8f23
 - PeakMemoryUsage: 32.02 KB (32784)
 - PerHostPeakMemUsage: 430.52 MB (451431312)
 - RowsProduced: 1 (1)
 - TotalNetworkReceiveTime: 10s379ms
 - TotalNetworkSendTime: 0.000ns
 - TotalStorageWaitTime: 0.000ns
 - TotalWallClockTime: 10s577ms
   - SysTime: 8.000ms
   - UserTime: 8.000ms
 - VoluntaryContextSwitches: 80 (80)

Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/plan-fragment-executor.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
7 files changed, 17 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4633/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4633
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifa88aa6f3371fa42d11ecc122f43c7d83623c300
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3812: Fix error message for unsupported types

2016-11-16 Thread Taras Bobrovytsky (Code Review)
Hello Internal Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4859

to look at the new patch set (#3).

Change subject: IMPALA-3812: Fix error message for unsupported types
..

IMPALA-3812: Fix error message for unsupported types

Before this patch an unclear error message was returned if DATE or
DATETIME appeared in the select list after a star expansion. This was
because DATE and DATETIME PrimitiveType was serialized as INVALID_TYPE.
This is fixed by serializing correctly.

Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
---
M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/UnsupportedTypes/data.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/misc.test
6 files changed, 35 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/4859/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9019b4bfd219f94e554c795befd3ff5e39706ea9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(22 comments)

what's the plan for keeping this compiling?

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

PS1, Line 95: synbol
typo


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

PS1, Line 150: )
add "!= 0" as we try to avoid implicit comparisons against zero conversion to 
bool.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 276:   *decimal >>= bytes_to_fill * 8;
> If bytes_to_fill >= sizeof(decimal), this is UB
since len comes from the data file, we could also have bytes_to_fill < 0, no? 
isn't that also undefined?


Line 278:   *decimal = 0;
can this happen with well-formed avro? if not, this be a return false case.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1379: double percent = total_rows * 100 / std::max(1L, num_input_rows);
> num_input_rows is sometimes 0.
if that were the case, we would have seen crashes.
In any case, when num_input_rows == 0, shouldn't we set precent to either 0 or 
100? or i guess that happens since it implies total_rows == 0, but would 
probably be more clear what's going on to just write:

percent = num_input_rows == 0 ? 0 : total_rows * 100 / num_input_rows;


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

PS1, Line 97: *
nit: we put the * next to type.


Line 100:   memcpy(&uinteger, &integer, sizeof(integer));
> type punning is UB.
using memcpy doesn't seem necessary. can't we just write:

uint32_t uinteger = integer;
uinteger = (uinteger << 1) | (uinteger >> 31);

neither of those stmts are undefined.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc
File be/src/exprs/bit-byte-functions-ir.cc:

PS1, Line 151: v * 
why is this necessary?  and does it generate the same code (let's make sure 
there is no multiply)?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 820:   if (!haystack) return NULL;
okay but this check is still unnecessary - no loop iterations will be taken. 
let's avoid the extraneous cmp/branch.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

PS1, Line 804: t
nit: "t != nullptr" per our style.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

Line 161: if (dest->len) memcpy(dest->ptr, src->ptr, dest->len);
Don't we return a non-null ptr (zero_length_region_) in this case to avoid the 
need for this check?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 975: if (string_val->ptr) memcpy(dest, string_val->ptr, 
string_val->len);
!= nullptr


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

Line 58:   const int result = len ? strncmp(s1, s2, len) : 0;
> if either s1 or s2 is nullptr, this is UB
len != 0

though, we may define StringCompare() to have the same semantics, making this 
unnecessary. e.g. Compare() wrapper, though  maybe other callsites can pass 
nulls?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

Line 75: if (tuple_desc.num_null_bytes()) {
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/udf/udf.cc
File be/src/udf/udf.cc:

Line 487:   if (LIKELY(len && !result.is_null)) {
len != 0

though this may be another place we expect to always have a valid pointer (as 
long as i!s_null).  We may need to switch the order of the 
FLAGS_stress_free_pool_alloc code in free-pool and its check for 0, however.  

That said, I'm okay with adding the len check here since we don't make the 
non-null assumption in the StringVal::StringVal(FunctionContext* context, int 
len) constructor itself either.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 36:   if (bit_width >= CHAR_BIT * sizeof(uint32_t)) return ~0;
> lshift past the end is UB
But where would we ever do that?  i.e. shouldn't this be a precondition to this 
routine (DCHECK)?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bitmap.h
File be/src/util/bitmap.h:

Line 78: if (buffer_.size()) memset(buffer_.data(), 255 * b, buffer_.size() 
* sizeof(uint64_t));
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

Lin

[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5103/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 260:   // Number of row operations attempted but not modified due to 
errors reported by the
Suggest some minor rephrasing:

Number of row operations attempted but not completed due to non-fatal errors 
reported by the storage engine that Impala treats as warnings.


http://gerrit.cloudera.org:8080/#/c/5103/2/shell/impala_shell.py
File shell/impala_shell.py:

Line 928: error_report = ", %d row error(s)" % (num_row_errors)
Don't we need to check somewhere whether num_row_errors was set in the response 
thrift struct? This still seems to add the error report for all DML statements, 
but maybe I'm missing something.


-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..


Patch Set 12: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 952: RETURN_IF_ERROR(OptimizeModule());
Is there a reason why we don't call DestroyIRModule() when OptimizeModule() 
fails ?


PS4, Line 986:  memory_manager_->bytes_allocated()),
 : memory_manager_->bytes_allocated()
The code may be easier to read if this factored into a single variable instead.


Line 1039:   if (!mem_tracker_->TryConsume(estimated_memory)) {
I wonder if it's better to skip optimization if we cannot reserve the memory to 
do so than punting back to interpretation ?


PS4, Line 1041: Substitute("Codegen failed to reserve '$0' bytes for 
optimization",
  :   estimated_memory),
  : estimated_memory);
These line wraps look a bit weird to me but not sure if it's a side effect of 
clang-format ?


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 708: optimiser
nit: optimizer


PS4, Line 709:  512 bytes
Mind documenting how this is derived so we can update it accordingly in case we 
change the optimization level in the future ?


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/codegen/mcjit-mem-mgr.h
File be/src/codegen/mcjit-mem-mgr.h:

PS4, Line 36: memory
This is for tracking memory consumed by the compiled machine code, right ?


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS4, Line 187:  false
true ?


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS4, Line 762: if (ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) {
 :   // Avoid bloating function by inlining too many exprs into 
it.
 :   expr_fn->addFnAttr(llvm::Attribute::NoInline);
 : }
I wonder if this can be more generalized by putting it in 
LlvmCodegen::FinalizeFunction() instead and use instruction count to determine 
whether the function is too large to be worth being inlined.

Of course, this may be a problem if we intentionally inline a big function  
into the cross-compiled code and rely on constant folding to eliminate 90% of 
the instructions in the function. In which case, we may need to do a quick 
constant propagation pass first in FinalizeFunction(). Just a thought.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
This seems a bit adhoc to me. May be better to track the size of IR functions 
and don't use it if it gets too large. Empirically, the time to generate the IR 
is rather small compared to the optimization time so the wasted work may not be 
too costly and hopefully, the large IR function is not a common case.


PS4, Line 190:  const
constexpr


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

PS4, Line 85: 
Oops. My bad.


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS4, Line 1911: 
  : 
Is it necessary to remove it ? For comparison, please see FirstValUpdate().


http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS4, Line 311: const
constexpr ?


PS4, Line 311: CODEGEN_INLINE_EXPRS_THRESHOLD
Wouldn't instruction count be a more precise estimation ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3882: Simplify some query exec state locking

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3882: Simplify some query exec state locking
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4935/6/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 40: using namespace strings;
This shouldn't be needed, we have 'using strings::Substitute' in names.h now.


http://gerrit.cloudera.org:8080/#/c/4935/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 82: /// 2. session_state_map_lock_
> Similar-ish, but not as acute. We don't have the problem where we have this
Sounds like we should file a JIRA for this.


http://gerrit.cloudera.org:8080/#/c/4935/6/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

Line 79:   void WaitForMetadataAvailable() { metadata_available_.Get(); }
I think the new name is a bit misleading, because the metadata may not be 
available if an error is encountered in the meantime.

There's an undocumented requirement to check the query status after calling 
this function.


-- 
To view, visit http://gerrit.cloudera.org:8080/4935
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I516357d2b5e9eb83e8209872cbe4c078c778a629
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..


Patch Set 12:

(1 comment)

Yes, the only changes since the last time it was +2ed are the three new test 
files (kudu_insert.test, kudu_delete.test, and kudu_update.test), the changes 
to add these and remove kudu_crud.test from test_kudu.py, and moving some tests 
from kudu_crud.test into kudu_create.test.

http://gerrit.cloudera.org:8080/#/c/4953/11/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

Line 39: insert into tdata values
> wrap long line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Thomas Tauber-Marshall (Code Review)
Hello Matthew Jacobs, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4953

to look at the new patch set (#12).

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..

IMPALA-4466: Improve Kudu CRUD test coverage

The results in the test files were verified by hand.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table. For now, this section cannot be used in a
test along with the RESULTS or ERRORS sections.

TODO: Refactor the DML test case handling (IMPALA-4471)

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
---
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
D testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/common/impala_test_suite.py
M tests/common/test_result_verifier.py
M tests/query_test/test_kudu.py
M tests/util/test_file_parser.py
10 files changed, 1,624 insertions(+), 487 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4953/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..


IMPALA-4477: Upgrade Kudu version to latest master

Change the toolchain build and Kudu version to use
the latest master, using Kudu commit e836ac.

Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Reviewed-on: http://gerrit.cloudera.org:8080/5106
Reviewed-by: Matthew Jacobs 
Tested-by: Internal Jenkins
---
M bin/impala-config.sh
1 file changed, 5 insertions(+), 4 deletions(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/490/

-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc
File be/src/runtime/string-compare-test.cc:

Line 34: EXPECT_EQ(stringcompare_r, 0);
> But the args to EXPECT_EQ don't include the args to RunTestCase: the actual
Oops, sorry, didn't read that carefully enough.  Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Dan Hecht (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5110

to look at the new patch set (#2).

Change subject: IMPALA-4493: fix string-compare-test when using clang
..

IMPALA-4493: fix string-compare-test when using clang

Only the 0 value or sign bit is specified in the return
value for strncmp(), so fix up the test accordingly.

Testing:
- verified the new test still reproduces IMPALA-4436
- verify the new test passes under ASAN build

Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
---
M be/src/runtime/string-compare-test.cc
1 file changed, 9 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5110/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc
File be/src/runtime/string-compare-test.cc:

Line 34: EXPECT_EQ(stringcompare_r, 0);
> EXPECT_EQ already prints its args (stringcompare_r), and the exact value of
But the args to EXPECT_EQ don't include the args to RunTestCase: the actual 
strings being compared.


-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc
File be/src/runtime/string-compare-test.cc:

Line 34: EXPECT_EQ(stringcompare_r, 0);
> Failures here would be easier to diagnose with the addition of " << l << ' 
EXPECT_EQ already prints its args (stringcompare_r), and the exact value of 
strncmp_r isn't interesting (only whether it is 0, neg, pos) which is implied 
by the EXPECT case.


-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> We could probably do it in the backend but the planner seemed like the appr
Sure. This seems to set things up so the planner can selectively disable 
codegen in exec node the future. While you are at it, would you mind using the 
same infrastructure to disable codegen for non-merging exchange ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5110/1/be/src/runtime/string-compare-test.cc
File be/src/runtime/string-compare-test.cc:

Line 34: EXPECT_EQ(stringcompare_r, 0);
Failures here would be easier to diagnose with the addition of " << l << ' ' << 
r"


-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 494:   24: required bool disable_codegen
please move up, into the common/non-union part of this struct (field 8?) and 
renumber the rest.


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> We could probably do it in the backend but the planner seemed like the appr
i agree, the planner is the right place for making static decisions about 
execution.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4956/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1919: template void AggregateFunctions::HllUpdate(
Reinstate this.


http://gerrit.cloudera.org:8080/#/c/4956/1/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 310:   /// evaluation into the calling function.
Remove partially implemented stuff.


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> A high level question: is there a reason why this decision cannot be made i
We could probably do it in the backend but the planner seemed like the 
appropriate place since the decision is based on the plan shape and expected 
cost.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
A high level question: is there a reason why this decision cannot be made in 
PartitionedAggregationNode::Codegen() ? I suppose there is way for an Agg node 
to know whether it's a merging Agg node, right ?


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4493: fix string-compare-test when using clang
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 161:   if (state->CodegenDisabled()) {
Shouldn't this be COdegenDisabledByQueryOption() to match the message?


http://gerrit.cloudera.org:8080/#/c/5105/2/be/src/exprs/scalar-fn-call.cc
File be/src/exprs/scalar-fn-call.cc:

PS2, Line 113: Interanl
Internal


PS2, Line 132: too_many_args
too_many_args_to_interpret?


PS2, Line 156: AddUdfToCodegen
Udf seems like a bit of a misnomer, since it may be a builtin. Maybe 
"AddScalarFnToCodegen()"?


http://gerrit.cloudera.org:8080/#/c/5105/2/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS2, Line 318: disable_codegen
It would be good if we had a different name to distinguish this from the other 
disable_codegen flag. E.g. hint_disable_codegen?


http://gerrit.cloudera.org:8080/#/c/5105/2/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

PS2, Line 42: exec_single_node_option=[100]
It seems like we'd want to test with the single node option both enabled and 
disabled


-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4493: fix string-compare-test when using clang

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5110

Change subject: IMPALA-4493: fix string-compare-test when using clang
..

IMPALA-4493: fix string-compare-test when using clang

Only the 0 value or sign bit is specified in the return
value for strncmp(), so fix up the test accordingly.

Testing:
- verified the new test still reproduces IMPALA-4436
- verify the new test passes under ASAN build

Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
---
M be/src/runtime/string-compare-test.cc
1 file changed, 9 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5110/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d82ac2bff33fdbf66275fcfc6558c4bc29de5e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 


[Impala-ASF-CR] IMPALA-4466: Improve Kudu CRUD test coverage

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4466: Improve Kudu CRUD test coverage
..


Patch Set 11: Code-Review+2

(1 comment)

Reviewed the new test cases that were added. I assume there weren't other 
changes to the UPSERT tests or the python infra code, right?

http://gerrit.cloudera.org:8080/#/c/4953/11/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

Line 39: insert into tdata values (3, cast('nan' as float), max_bigint(), '', 
true, min_tinyint(), max_smallint(), cast('-inf' as double))
wrap long line


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan 
generation.
..


Patch Set 1:

There are still some failing tests because of expected warnings from expr 
evaluations. I still need to address those, probably by bailing on constant 
folding if there is any warning/error.

-- 
To view, visit http://gerrit.cloudera.org:8080/5109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.

2016-11-16 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5109

Change subject: PREVIEW: IMPALA-1788: Fold constant expressions during plan 
generation.
..

PREVIEW: IMPALA-1788: Fold constant expressions during plan generation.

Adds a new ExprRewriteRule for replacing constant expressions
with their literal equivalent via BE evaluation. Applies the
rewrite during part of plan generation.

After this patch the state of expression rewriting is:
1. A first phase of rewriting is performed on the analyzed
   parse tree. The modified parse tree is reset() and
   analyzed again. We need to apply some rewrites at this
   stage because the transformations may affect conjunct
   registration (e.g., may generate new conjuncts).
   The rewrites in this phase are applied to the unresolved
   expressions of the parse tree.
2. A second phase of rewriting is performed during plan
   generation (now only constant folding). Doing rewrites
   at this stage has the benefit of being able to transform
   the fully resolved exprs. It also implies that we can
   perform rewrites without changing the output type of exprs,
   as opposed to the the first phase where types may change
   due to the re-analysis. This ensures that the optimizations
   have no user-visible effect.

This patch includes the following interesting changes:
- Introduces a timestamp literal that can only be produced
  by constant folding (not expressible directly via SQL).
- Modifies PlanNode.init() and all its implementations. The
  rewrites are performed in this function because predicate
  pushdown depends on constant folding for some scan nodes,
  e.g., pushing predicates to Kudu/HBase.
- Fixes an existing issue with converting strings between
  the FE/BE. String produced in the BE that have characters
  with a value > 127 are not correctly deserialized into a
  Java String via thrift. We detect this case during constant
  folding and abandon folding of such exprs.
- Cleans up ExprContext::GetValue() into
  ExprContext::GetConstantValue() which clarifies its only use
  of evaluating exprs from the FE.

Testing:
- Modifies expr-test.cc to run all tests through the constant
  folding path.
- Adds basic planner and rewrite rule tests.

Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
---
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-context.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/service/fe-support.cc
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
A fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/java/org/apache/impal

[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4216: Test became flaky: 
TestTpchMemLimitError.test_low_mem_limit_q20
..


Patch Set 1:

Tim, maybe this was one of the dropped scanner status that you've fixed?

-- 
To view, visit http://gerrit.cloudera.org:8080/4572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20

2016-11-16 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4216: Test became flaky: 
TestTpchMemLimitError.test_low_mem_limit_q20
..


Patch Set 1:

> > But why do we no longer get the MEM_LIMIT_EXCEEDED error status
 > > returned by this query?
 > 
 > I could not reproduce the issue. My guess is that we get the
 > MEM_LIMIT_EXCEEDED error status but it doesn't get added to the
 > error_log. Probably when we call LogError(status.msg), ErrorCount()
 > is greater than max_errors:
 > 
 > bool RuntimeState::LogError(const ErrorMsg& message, int
 > vlog_level) {
 > lock_guard l(error_log_lock_);
 > // All errors go to the log, unreported_error_count_ is counted
 > independently of the
 > // size of the error_log to account for errors that were already
 > reported to the
 > // coordinator
 > VLOG(vlog_level) << "Error from query " << query_id() << ": " <<
 > message.msg();
 > if (ErrorCount(error_log_) < query_options().max_errors) {
 > AppendError(&error_log_, message);
 > return true;
 > }
 > return false;
 > }
 > 
 > I could be mistaken, I'm not really familiar with how error
 > handling is implemented.

The query status doesn't come from the error log, so I don't think the limit 
should have an impact here.  It seems we are dropping a returned mem limit 
exceeded status somewhere, and we should fix that.

Do we still have the log file for a failed run?  If so, the log should show the 
callstack in which the MEM_LIMIT_EXCEEDED status object was created, which may 
give us a clue as to where it's getting dropped.  If we don't have the log, we 
may need to wait for this to repro.

-- 
To view, visit http://gerrit.cloudera.org:8080/4572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4216: Test became flaky: TestTpchMemLimitError.test low mem limit q20

2016-11-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-4216: Test became flaky: 
TestTpchMemLimitError.test_low_mem_limit_q20
..


Patch Set 1:

> But why do we no longer get the MEM_LIMIT_EXCEEDED error status
 > returned by this query?

I could not reproduce the issue. My guess is that we get the MEM_LIMIT_EXCEEDED 
error status but it doesn't get added to the error_log. Probably when we call 
LogError(status.msg), ErrorCount() is greater than max_errors:

bool RuntimeState::LogError(const ErrorMsg& message, int vlog_level) {
  lock_guard l(error_log_lock_);
  // All errors go to the log, unreported_error_count_ is counted independently 
of the
  // size of the error_log to account for errors that were already reported to 
the
  // coordinator
  VLOG(vlog_level) << "Error from query " << query_id() << ": " << 
message.msg();
  if (ErrorCount(error_log_) < query_options().max_errors) {
AppendError(&error_log_, message);
return true;
  }
  return false;
}

I could be mistaken, I'm not really familiar with how error handling is 
implemented.

-- 
To view, visit http://gerrit.cloudera.org:8080/4572
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I471a90c8e3bdecb4fed08fbae3436c50b8618471
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3838: Codegen EvalRuntimeFilters().
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4833/1/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 73: if (!filter_ctxs_[i]->Eval(row)) {
> Actually, RuntimeFilter::Eval() will return true if the filter hasn't arriv
I had it reversed - as far as I can see, before filter arrival, it will 
increment 'considered' and not 'rejected' for every row processed even though 
the filter is not available yet.

I couldn't convince myself that the code was right, so I added some logging and 
it looks like it's doing something that doesn't make sense: IMPALA-4495


http://gerrit.cloudera.org:8080/#/c/4833/2/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

PS2, Line 61:  tuple row
Is it a tuple row? I though it was a raw value pointer?


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Matthew Jacobs (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5106

to look at the new patch set (#2).

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..

IMPALA-4477: Upgrade Kudu version to latest master

Change the toolchain build and Kudu version to use
the latest master, using Kudu commit e836ac.

Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
---
M bin/impala-config.sh
1 file changed, 5 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/5106/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5106/1/bin/impala-config.sh
File bin/impala-config.sh:

PS1, Line 55: generated by the
: # native-toolchain build when publishing its build artifacts
> Can we expand this comment to indicate where this generated id can be found
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5106/1/bin/impala-config.sh
File bin/impala-config.sh:

PS1, Line 55: generated by the
: # native-toolchain build when publishing its build artifacts
Can we expand this comment to indicate where this generated id can be found?


-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4477: Upgrade Kudu version to latest master

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5106

Change subject: IMPALA-4477: Upgrade Kudu version to latest master
..

IMPALA-4477: Upgrade Kudu version to latest master

Change the toolchain build and Kudu version to use
the latest master, using Kudu commit e836ac.

Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
---
M bin/impala-config.sh
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/5106/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5106
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I49f8582cc3c0f776167fe3decf4236345ba78bd3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..

IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

Adds support in the shell to report the number of modified
rows for all DML operations, as well as the number of rows
with errors.

Testing: Added shell tests.

Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
---
M be/src/service/impala-beeswax-server.cc
M common/thrift/ImpalaService.thrift
M shell/impala_client.py
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
5 files changed, 92 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/5103/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting

2016-11-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3713,IMPALA-4439: Fix Kudu DML shell reporting
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5103/1/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 532: insert_result->__set_num_row_errors(num_row_errors);
> Can we leave this unset for DML where reporting the num_row_errors doesn't 
Done


http://gerrit.cloudera.org:8080/#/c/5103/1/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 260:   // Number of rows that weren't modified due to errors. Only applies 
to Kudu tables.
> Can you qualify errors more precisely? These are constraint violations corr
Done


http://gerrit.cloudera.org:8080/#/c/5103/1/shell/impala_shell.py
File shell/impala_shell.py:

Line 928: errors_stmt = ", %d row error(s)" % (num_row_errors)
> Let's avoid "stmt" since that is a somewhat overloaded abbreviation (aka SQ
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3d3d7aa8d176e03ea58fb00f2a81fb3e34965aa1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3:

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/487/

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3:

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/486/

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3:

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/485/

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/483/

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4470: Avoid creating a NumericLiteral from 
NaN/infinity/-0.
..


Patch Set 5:

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/484/

-- 
To view, visit http://gerrit.cloudera.org:8080/5050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 3: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-16 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4969

to look at the new patch set (#3).

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..

IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

The join build sink patches refactored the DataSink interface and
inadvertently removed this counter from the profile.
The problem was that the sink MemTracker was not initialized with the
sink's profile.

The fix is for the sink to create the MemTracker itself.

Testing:
Ran core tests. Manually checked profile to make sure the counter
appeared in HdfsTableSink, DataStreamSender, etc.

Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/row-batch-cache.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/plan-fragment-executor.h
19 files changed, 64 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4969/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4470: Avoid creating a NumericLiteral from NaN/infinity/-0.

2016-11-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4470: Avoid creating a NumericLiteral from 
NaN/infinity/-0.
..


Patch Set 5: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/482/

-- 
To view, visit http://gerrit.cloudera.org:8080/5050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8243b2ee9fa9c470d078b385583f2f48b606a230
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


Patch Set 2:

> Thanks!
 > 
 > Do you think I should add some code to catalog to validate the
 > value of 'initial_hms_cnxn_timeout_s'? What would be the acceptable
 > range?

Also, what is the recommended way to validate a config parameter in catalog? A 
simple Preconditions.checkArgument() call would be sufficient?

Adding this config parameter to the CM UI is another issue. Do you think it 
should be added for the 5.10 release or it is not that urgent? I guess, the 
default value is reasonable for most users and they can always set it as a 
safety valve if they have to.

-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


  1   2   >