[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 4:

I added a new query option ENABLE_EXPR_REWRITES and set it to false for 
expr-test.cc

I investigated the NULL type change and it turns out there is no easy way to 
preserve the type with out current approach because after rewriting we wipe all 
analysis state and re-analyze fresh. At that point, there is no way for us to 
know what type a NullLiteral should have. I don't see any harm in it.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-07 Thread Alex Behm (Code Review)
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#5).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Adds a new query option ENABLE_EXPR_REWRITES to enable/disable
non-essential expr rewrites in the FE. Note that some rewrites
are required, e.g., BetweenToCompoundRule. Disabling the rewrites
is useful for testing, in particular, to make sure that the exprs
specified in expr-test.cc are executed as written.

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
17 files changed, 281 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3552: Make incremental stats max serialized size 
configurable
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4867/7/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

Line 151: Status Catalog::GetCatalogConfigBytes(JNIEnv* jni_env, jbyteArray* 
cfg_bytes) const {
> Makes sense, but since impalad/catalogd are the same binary they can see al
Or maybe even add a new file for this like backend-config.h in util.


-- 
To view, visit http://gerrit.cloudera.org:8080/4867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: Yonghyun Hwang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: if (NeedsValidation() && !ValidateSlot(val_ptr).ok()) {
> single line if it fits
I don't think this will behave quite right with abort_on_error=false. Returning 
false here will terminate the scan immediately.


Line 599:   return Status::OK();
> It's set to NULL in the implementation of 
Setting a slot to NULL means flipping the null bit in the containing tuple. I 
don't think you're doing that from within TimestampValue::Validate(), and it 
wouldn't make sense to do it there.

Looking a little closer at the code, I think we may have to move this check 
into ReadValue() instead of ReadSlot(). 
We need to do the following if validation fails:
tuple->SetNull(null_indicator_offset_);


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Improve Kudu UPSERT test coverage
..


Patch Set 2:

(7 comments)

Thanks! This is great.

I didn't look as closely as I did in revision 1, but I'll check again tomorrow 
and but can probably give a +1. I'll see if anyone wants to take a look at the 
python code. Otherwise I'll give it a +2 after I look again.

http://gerrit.cloudera.org:8080/#/c/4953/2//COMMIT_MSG
Commit Message:

PS2, Line 9: lease
release
... though I guess it seems odd to _re_ lease the Impala Kudu integration which 
has never been leased before (at least not GA).


PS2, Line 10: 5.10
2.8


PS2, Line 9: In preparation for the public lease of Kudu integration in
   : 5.10, we need to make sure that we've covered as much of the
   : Kudu related functionality with tests as possible. This patch
   : covers UPSERT.
Not really needed in the commit.


PS2, Line 14: It also introduces a new test section 'DML_RESULTS', which
: takes the name of a table as a comment and the contents of the
: table as its body and then verifies that the body matches the
: actual contents of the table. This makes it easy to check that a
: DML operation has the desired effect on the contents of a table,
: rather than always having to add another test case that runs a
: select on the table.
nice


http://gerrit.cloudera.org:8080/#/c/4953/1/testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test:

Line 433
> How about adding the 'LABELS' section?
That's great, I didn't know it existed.


PS1, Line 680: 
 : 
 : 
 : 
> Good point - there appears to only actually be 10 distinct values of int_co
Ah, makes sense. Thanks for looking into that, I think it's fine. We can think 
about changing the wording in the future (not something to pile on now).


http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS2, Line 356: select * from %s
might be good to put a limit on here for safety, e.g. the number of rows 
specified in the "expected rows" section, or even just some big constant like 
1000. Even if it's kinda hacky/random, it'd be crazy to have a test case where 
the DML_RESULTS section even got close but would prevent the test from trying 
to do something dumb.


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4266: Java udf returning string can give incorrect 
results
..


IMPALA-4266: Java udf returning string can give incorrect results

The memory management of string results was wrong: strings returned from
Exprs must live until the next time FreeLocalAllocations() is called.
Otherwise the buffer holding the string is freed or reused by the next
UDF call. The fix is to copy string values into a buffer with the
right lifetime.

Testing:
Added a regression test based on Bharath's example that reproduced the
bug reliably.

Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161
Reviewed-on: http://gerrit.cloudera.org:8080/4941
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/exprs/hive-udf-call.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
A tests/test-hive-udfs/src/main/java/org/apache/impala/ReplaceStringUdf.java
4 files changed, 70 insertions(+), 1 deletion(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4266: Java udf returning string can give incorrect 
results
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
..


Patch Set 14: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Martin Grund 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable

2016-11-07 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#7).

Change subject: IMPALA-3552: Make incremental stats max serialized size 
configurable
..

IMPALA-3552: Make incremental stats max serialized size configurable

The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation
on the maximum serialized size of incremental stats. As a side effect,
some users with very large tables are experiencing regressions
especially when they upgrade impala and the serialized size goes
beyond 200MB.

To mitigate the issue, the change introduces a new gflag,
'inc_stats_size_limit_bytes' to make the max serialized size
configurable, which allows impala users to specify their own maximum
serialized size. Default value for inc_stats_size_limit_bytes is
200MB.

The change amends the TBackendConfig class to pass the gflags from
backend to the Frontend and the Catalog via thrift.  This also revamps
existing query options to use the TBackendConfig.

Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/common/global-flags.cc
M be/src/service/fe-support.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Frontend.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/authorization/User.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
21 files changed, 195 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/4867/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: Yonghyun Hwang 


[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3872: allow providing PyPi mirror for python packages
..


Patch Set 6: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/442/

-- 
To view, visit http://gerrit.cloudera.org:8080/4770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable

2016-11-07 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#6).

Change subject: IMPALA-3552: Make incremental stats max serialized size 
configurable
..

IMPALA-3552: Make incremental stats max serialized size configurable

The fix "IMPALA-2648/IMPALA-2664" introduced a conservative limitation
on the maximum serialized size of incremental stats. As a side effect,
some users with very large tables are experiencing regressions
especially when they upgrade impala and the serialized size goes
beyond 200MB.

To mitigate the issue, the change introduces a new gflag,
'inc_stats_size_limit_bytes' to make the max serialized size
configurable, which allows impala users to specify their own maximum
serialized size. Default value for inc_stats_size_limit_bytes is
200MB.

The change amends the TBackendConfig class to pass the gflags from
backend to the Frontend and the Catalog via thrift.  This also revamps
existing query options to use the TBackendConfig.

Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
---
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/common/global-flags.cc
M be/src/service/fe-support.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Frontend.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/authorization/User.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
21 files changed, 196 insertions(+), 160 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/4867/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: Yonghyun Hwang 


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4898/4/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

Line 292: " or see log for previous errors that prevented use of 
provided directories");
does the log contain the errors, or do we need to log the error status at line 
286?


Line 336: } else if (status.code() == TErrorCode::SCRATCH_LIMIT_EXCEEDED) {
seems like dead code now that the check is handled at the file group level, no?


http://gerrit.cloudera.org:8080/#/c/4898/4/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

PS4, Line 72: allocation would exceed the allocation limit of its associated 
FileGrou
this check doesn't seem to be done by this function anymore.


-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#13).

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M 
testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
A 
testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/common/test_result_verifier.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_udfs.py
47 files changed, 1,003 insertions(+), 762 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.cc
File be/src/codegen/codegen-anyval.cc:

Line 103:   DCHECK(false) << "NYI:" << type.DebugString();
> why do we not need these still with the removal of the bail outs for CHAR?
The char bailout was still there, but more implicit.


Line 462: void CodegenAnyVal::SetFromRawPtr(Value* raw_ptr) {
> this looks unused. remove?
Done


http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.h
File be/src/codegen/codegen-anyval.h:

Line 53: /// TYPE_TIMESTAMP/TimestampVal: { i64, i64 }
> decimal?
Done


http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1071: // eliminate a branch per value.
> what code is this paragraph referring to? should it precede the Init() call
It probably makes sense to move it up, since it's implemented by both Init() 
and the below code.


Line 1079: && evaluator->intermediate_type().type != TYPE_TIMESTAMP) {
> IsTimestampType() for consistency now that most types have that?
Done


PS12, Line 1454: :
> maybe explain that we hand-generate a code sequence in this case.
Done


PS12, Line 1670: .type != TYPE_TIMESTAMP
> IsTimestampType()?
Done


PS12, Line 1742: We must use the unlowered type
> would be good to briefly explain why rather than just the what.
Done


Line 1811:   "intermediate tuple desc");
> when would this happen?
If there's a char field in the tuple. I think this is a step back in terms of 
error messages, so I added back an explicit check for CHAR fields in the 
intermediate tuple.

I actually noticed there are a couple of places in the codebase that don't 
check for a null result from this function, so I fixed those.


http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS12, Line 630: ;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

PS12, Line 143: the
> "a boolean value represented ..." ? like above comment.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4441: Divide-by-zero in 
RuntimeProfile::SummaryStatsCounter::SetStats
..


IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats

This patch anticipates the case where total_num_values_ can be 0 and
makes sure a divide-by-zero is not possible.

Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151
Reviewed-on: http://gerrit.cloudera.org:8080/4975
Reviewed-by: Henry Robinson 
Tested-by: Internal Jenkins
---
M be/src/util/runtime-profile.cc
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Henry Robinson: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4441: Divide-by-zero in 
RuntimeProfile::SummaryStatsCounter::SetStats
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 44: 
> ?
Done


Line 491:   inline bool NeedsValidation() const {
> Can you add a brief comment?
Done


Line 502:   /// Verifies that data is acceptable
> Can you elaborate what "acceptable" means?
Rewrote the comment


http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 101: void TimestampValue::Validate() {
> Can you move the exception handling into DebugString()? This way it would b
I rewrote this so that it does not rely on catching an exception as Alex 
suggested.


PS1, Line 103:
> Nit: tab. git-clang-format should fix this.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: Preview: IMPALA-4363: Add timestamp validation
..

Preview: IMPALA-4363: Add timestamp validation

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
4 files changed, 60 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
..


IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError

In some development environments, the ParserTests may always fail with an
java.lang.UnsatisfiedLinkError:

org.apache.impala.service.FeSupport.NativeGetStartupOptions()[B
  at o.a.i.service.FeSupport.NativeGetStartupOptions(Native Method)
  at o.a.i.service.FeSupport.GetStartupOptions(FeSupport.java:268)
  at o.a.i.common.RuntimeEnv.(RuntimeEnv.java:47)
  at o.a.i.common.RuntimeEnv.(RuntimeEnv.java:34)
  at o.a.i.testutil.TestUtils.assumeKuduIsSupported(TestUtils.java:288)
  at o.a.i.analysis.ParserTest.TestKuduUpdate(ParserTest.java:1697)

I believe the issue is related to some static loading of
classes and/or libraries in Java because changing the
ParserTest to initialize the Frontend makes the error go
away. I haven't been able to pin-point the exact issue with
loading, but it makes sense that the ParserTest should
initialize the Frontend static state if it will be called by
libfesupport later since it seems to be an issue affecting
some environments and not others, i.e. subject to
environmental factors.

This fixes the issue by changing ParserTest to extend
FrontendTestBase which initializes the Frontend class
statically.

Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87
Reviewed-on: http://gerrit.cloudera.org:8080/4976
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
1 file changed, 2 insertions(+), 20 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3552: Make incremental stats max serialized size configurable

2016-11-07 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-3552: Make incremental stats max serialized size 
configurable
..


Patch Set 5:

(15 comments)

I've taken over this. Please check the next PS.

http://gerrit.cloudera.org:8080/#/c/4867/5//COMMIT_MSG
Commit Message:

PS5, Line 21:  set of
: gflags, to frontend, 
> nit: replace with "to the"
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

Line 150: const {
> move into line above (90 chars per line max)
Done


Line 152:   cfg.load_catalog_in_background = FLAGS_load_catalog_in_background;
> use the thrift setter functions
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

Line 104:   Status GetCatalogConfigBytes(JNIEnv* jni_env, jbyteArray* 
cfg_bytes) const;
> Document a line or two on what this does.
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/be/src/service/frontend.h
File be/src/service/frontend.h:

Line 176:   Status GetFrontendConfigBytes(JNIEnv* jni_env, jbyteArray* 
cfg_bytes) const;
> Document.
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 236:   1: required string sentry_config
> Why are some of these required and some optional? I couldn't make out a pat
Made everything optional.


Line 242:   3: required i32 other_log_lvl
> non_impala_java_vlog
Done


Line 246:   5: required i64 inc_stats_size_limit_bytes
> your gflag is a uint64, I suggest making the gflag signed as well
Done


Line 249:   6: required bool compute_lineage
> if possible, I'd prefer to pass the gflags directly, i.e., instead of a boo
Isn't it better if we let the backend drive the decision here and just pass the 
flags the fe? I'm fine either way but if you feel strongly about this, I'll 
make the change. Please let me know.


http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1601: (statsSizeEstimate < 
BackendConfig.INSTANCE.getIncStatMaxSize());
> getIncStatsMaxSize() (Stats vs. Stat)
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

Line 30: public class RuntimeEnv {
> much better!
:)


http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/BackendConfig.java
File fe/src/main/java/org/apache/impala/service/BackendConfig.java:

Line 41:   }
> How about adding initializing the singleton this way:
Done


Line 43:   public void setBackendConfig(TBackendConfig cfg) {
> indentation
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

Line 83:   public JniCatalog(byte[] thriftBEConfig) throws InternalException,
> thriftBackendConfig or thriftBeConfig
Done


http://gerrit.cloudera.org:8080/#/c/4867/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

Line 120:   public JniFrontend(byte[] thriftBEConfig) throws InternalException,
> remove InternalException because that's already covered by ImpalaException
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33684725a61eabc67237503e61178305d37d3cb5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Yonghyun Hwang
Gerrit-Reviewer: Yonghyun Hwang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> How can we separate them though? The new API returns a UUID.
oops, i didn't see that. in that case, shipping those around would cause an 
immediate size regression. so we need to translate them into ordinals.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3126: Conservative assignment of inner-join On-clause predicates.

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4982

Change subject: IMPALA-3126: Conservative assignment of inner-join On-clause 
predicates.
..

IMPALA-3126: Conservative assignment of inner-join On-clause predicates.

Implements the following conservative but correct policy for assigning
predicates from the On-clause of an inner join:
If the predicate references an outer-joined tuple, then evaluate it at
the inner join that the On-clause belongs to.

Cleans up Analyzer.canEvalPredicate().

Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
2 files changed, 143 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/4982/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idf45323ed9102ffb45c9d94a130ea3692286f215
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1169: Admission control info on the queries debug webpage
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4266: Java udf returning string can give incorrect results

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4266: Java udf returning string can give incorrect 
results
..


Patch Set 5: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I705d271814cb1143f67d8a12f4fd87bab7a8e161
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3872: allow providing PyPi mirror for python packages
..


Patch Set 6: Code-Review+2

carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3872: allow providing PyPi mirror for python packages
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4770/4//COMMIT_MSG
Commit Message:

Line 11: the mirrors).
> Commit should perhaps say how the mirror is provided to the script.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages

2016-11-07 Thread Tim Armstrong (Code Review)
Hello Michael Brown, Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4770

to look at the new patch set (#5).

Change subject: IMPALA-3872: allow providing PyPi mirror for python packages
..

IMPALA-3872: allow providing PyPi mirror for python packages

We still rely on the python.org json API, which doesn't seem to be
mirrored (instead there's a html-based index format implemented by
the mirrors).

The mirror can be provided by setting the PYPI_MIRROR environment
variable. The default is "https://pypi.python.org;.

Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
---
M infra/python/deps/pip_download.py
1 file changed, 8 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/4770/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3872: allow providing PyPi mirror for python packages

2016-11-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3872: allow providing PyPi mirror for python packages
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4770/4//COMMIT_MSG
Commit Message:

Line 11: the mirrors).
Commit should perhaps say how the mirror is provided to the script.


-- 
To view, visit http://gerrit.cloudera.org:8080/4770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

2016-11-07 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4769/13//COMMIT_MSG
Commit Message:

Line 51: still problems to work out with the remote data load script itself.
> Did you try loading data on a remote cluster and running tests on in with t
Yes, many times. I should update this sentence to be more clear.

This is mainly a references to the several "clean up" changes that Harrison 
suggested earlier, for which JIRA's have been opened. We can address those when 
there's time. More pressing than cleaning up all those things is fact that we 
need to have this checked in order to validate Impala running against a remote 
CDH 5.10 cluster, and time is getting short. We have less than two weeks now.

There were some other actual problems that were mysterious to me initially. 
E.g., Kudu related failures started appearing once recent Kudu changes were 
submitted -- until I realized that this issue was breaking things: 

https://jira.cloudera.com/browse/OPSAPS-37322

But after tweaking the cluster, data loading works, and tests run -- though 
many tests may need to be tweaked to work remotely.


http://gerrit.cloudera.org:8080/#/c/4769/13/bin/remote_data_load.py
File bin/remote_data_load.py:

Line 534: sys.exit(1)
> In general, I think it's a bad practice to call sys.exit inside functions. 
OK, I'll move this. I'd seen this pattern used here in other scripts here 
(e.g., load-data.py that we use for local data loading), so didn't know it was 
a frowned upon practice.


http://gerrit.cloudera.org:8080/#/c/4769/13/testdata/datasets/functional/schema_constraints.csv
File testdata/datasets/functional/schema_constraints.csv:

PS13, Line 120: Wide tables fail due to the SERDEPROPERTIES limits
> is this a new issue? Is it specific to remote data loading?
For our mini-cluster, we work around this problem here:

https://github.com/apache/incubator-impala/blob/master/bin/create-test-configuration.sh#L99

However, create-test-configuration.sh is part of our local build process. It 
doesn't get called when CDH is deployed to a remote cluster. Besides, that 
script assumes that the metastore database will always be postgres, which is 
not the case when testing against a remote cluster.

Before the change to this file, I had been using another hand-rolled script to 
configure the property separately after deployment. With this, I can drop that 
step.

I've also tested the local data load after this change, and it's unaffected.


-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Martin Grund 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..


Patch Set 10:

Hit IMPALA-3040

-- 
To view, visit http://gerrit.cloudera.org:8080/4838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4437: fix crash in disk-io-mgr

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4979

Change subject: IMPALA-4437: fix crash in disk-io-mgr
..

IMPALA-4437: fix crash in disk-io-mgr

This fixes another issue where the 'buffer_' field was not set to NULL
on an error, triggering a DCHECK.

Testing:
Added a unit test that triggers the bug on the two different codepaths
that I fixed.

Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00
---
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
2 files changed, 40 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4979/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: Minor fixes to remove more "cloudera"s from the code.
..


Minor fixes to remove more "cloudera"s from the code.

Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a
Reviewed-on: http://gerrit.cloudera.org:8080/4774
Reviewed-by: Jim Apple 
Tested-by: Internal Jenkins
---
M tests/benchmark/report_benchmark_results.py
M tests/test-hive-udfs/src/main/java/org/apache/impala/UnresolvedUdf.java
2 files changed, 15 insertions(+), 12 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4260: Alter table add column drops all the column stats

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4260: Alter table add column drops all the column stats
..


IMPALA-4260: Alter table add column drops all the column stats

Hive expects types for column stats to be specified as all lower
case. For some reason, it doesn't check this when the stats are
first written, but it does check when performing an 'alter table'.
This causes it to drop stats that Impala wrote because we specify
type names in upper case.

This patch converts the types that Impala sends to Hive for the
column stats to all lower case and adds a regression test.

I also filed HIVE-15061 to track the issue from the Hive end.

Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
Reviewed-on: http://gerrit.cloudera.org:8080/4845
Reviewed-by: Matthew Jacobs 
Tested-by: Internal Jenkins
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
2 files changed, 83 insertions(+), 1 deletion(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4845
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4260: Alter table add column drops all the column stats

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4260: Alter table add column drops all the column stats
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4845
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4969

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..

IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

The join build sink patches refactored the DataSink interface and
inadvertently removed this counter from the profile.
The problem was that the sink MemTracker was not initialized with the
sink's profile.

The fix is to replumb things so that the profile is created in the
constructor and can be used when constructing the MemTracker.

Testing:
Ran core tests. Manually checked profile to make sure the counter
appeared in HdfsTableSink, DataStreamSender, etc.

Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/plan-fragment-executor.cc
19 files changed, 85 insertions(+), 98 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4969/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 11: Code-Review+1

Assuming the S3 run passes.

-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default

2016-11-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default
..


Patch Set 4:

Updated patch to resolve merge conflicts with:
1) UPSERT patch
2) Support for non-covering range partitioning

-- 
To view, visit http://gerrit.cloudera.org:8080/4911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default

2016-11-07 Thread Matthew Jacobs (Code Review)
Hello Internal Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4911

to look at the new patch set (#4).

Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default
..

IMPALA-3710: Kudu DML should ignore conflicts by default

Removes the non-standard IGNORE syntax that was allowed for
DML into Kudu tables to indicate that certain errors should
be ignored, i.e. not fail the query and continue. However,
because there is no way to 'roll back' mutations that
occurred before an error occurs, tables are left in an
inconsistent state and it's difficult to know what rows were
successfully modified vs which rows were not. Instead, this
change makes it so that we always 'ignore' these conflicts,
i.e. a 'best effort'. In the future, when Kudu will provide
the mechanisms Impala needs to provide a notion of isolation
levels, then Impala will be able to provide options for more
traditional semantics.

After this change, the following errors are ignored:
* INSERT where the PK already exists
* UPDATE/DELETE where the PK doesn't exist

Another follow-up patch will change other violations to be
handled in this way as well, e.g. nulls inserted in
non-nullable cols.

Reporting:
The number of rows inserted is reported to the coordinator,
which makes the aggregate available to the shell and via the
profile.
TODO: Return rows modified for INSERT via HS2 (IMPALA-1789).
TODO: Return rows modified for other CRUD (beeswax+hs2) (IMPALA-3713).
TODO: Return error counts for specific warnings (IMPALA-4416).

Testing:
Updated tests. Ran all functional tests. More tests will be
needed when other conflicts are handled in the same way.

Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
17 files changed, 66 insertions(+), 200 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4911/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-07 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 9:

(3 comments)

Thanks for the reviews. I'm running another private job to test the change on 
S3 and will update the Jira once it's done.

http://gerrit.cloudera.org:8080/#/c/4863/9/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 488: import os
> why?
Sry, leftover from trying to get the tests to work.


Line 491: table_location = DEFAULT_FS + "/" + table_path
> I'm confused. Doesn't get_fs_path() do the right thing? Why do we have to a
It does. I couldn't figure out why FILESYSTEM_PREFIX works the way it does on 
the Jenkins host I ran this on, but not on my local dev machine. Sailesh helped 
me, now I'm convinced it should work. I'm running another private build and 
will update this change again once I have results.


Line 550:   table, table_location)
> indent 4
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4048: Misc. improvements to /sessions

2016-11-07 Thread Henry Robinson (Code Review)
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-4048: Misc. improvements to /sessions
..


IMPALA-4048: Misc. improvements to /sessions

* Make table searchable and sortable
* Fix 'last accessed time' being printed in UTC
* Made table contents more compact so that it mostly fits on screen
* Clarify summary text re: active and inactive sessions
* Include fix in mustache-cpp required to print 64-bit integers
  correctly (see
  
https://github.com/henryr/cpp-mustache/commit/29768bf0e84f5a1e95e006fc64996d375499dbda)

Testing: Visual inspection and manual sorting, searching etc.

Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3
Reviewed-on: http://gerrit.cloudera.org:8080/4880
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong 
---
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/thirdparty/mustache/mustache.cc
M www/sessions.tmpl
6 files changed, 78 insertions(+), 46 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, but someone else must approve
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I14edcb6d60cf031a62c5a20b2d2b4d23248633a3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-07 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4863

to look at the new patch set (#11).

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..

IMPALA-2523: Make HdfsTableSink aware of clustered input

IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M bin/impala-config.sh
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_insert.py
M tests/query_test/test_insert_behaviour.py
M tests/util/filesystem_utils.py
14 files changed, 344 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats

2016-11-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4441: Divide-by-zero in 
RuntimeProfile::SummaryStatsCounter::SetStats
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4769/13//COMMIT_MSG
Commit Message:

Line 51: still problems to work out with the remote data load script itself.
Did you try loading data on a remote cluster and running tests on in with the 
scripts in this patch?


http://gerrit.cloudera.org:8080/#/c/4769/13/bin/remote_data_load.py
File bin/remote_data_load.py:

Line 534: sys.exit(1)
In general, I think it's a bad practice to call sys.exit inside functions. It's 
better to raise an exception and if it does not get caught on the outside, then 
exit will be triggered.


http://gerrit.cloudera.org:8080/#/c/4769/13/testdata/datasets/functional/schema_constraints.csv
File testdata/datasets/functional/schema_constraints.csv:

PS13, Line 120: Wide tables fail due to the SERDEPROPERTIES limits
is this a new issue? Is it specific to remote data loading?

I looked at HIVE-1364 it says it got fixed a few years ago.


-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Martin Grund 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 3: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4898/3/be/src/runtime/buffered-block-mgr.h
File be/src/runtime/buffered-block-mgr.h:

Line 508:   /// is
nit: line wrapping


http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

PS1, Line 296: x_ = ran
> Good point, users have been confused by this before when the mixed terminol
I agree that replacing Tmp with Scratch might not be worth it. Having those two 
works for me.


Line 336: } else if (status.code() == TErrorCode::SCRATCH_LIMIT_EXCEEDED) {
> I prefer it this way since it's more obvious that it's an "successful" retu
Good point.


http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

PS1, Line 136: unique_i
> I don't think it will ever be anything asides from the query id, but maybe 
Thanks. When I first saw query_id I assumed there might be some side effects 
that require query id semantics, so unique_id seems clearer to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 1:

Overall structure looks good ok to me, minus Lars' comments.

Ideally, we'd have a cheaper way to validate other than doing DebugString() 
with a try-catch. It should be possible for us to determine the acceptable 
integer value range and check that.

-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 3:

PS3 is a rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..

IMPALA-3202: refactor scratch file management into TmpFileMgr

This is a pure refactoring patch that moves all of the logic
for allocating scratch file ranges into TmpFileMgr in anticipation of
this logic being used by the new BufferPool.

There should be no behavioural changes.

Also remove a bunch of TODOs that we're not going to fix.

Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
---
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-block-mgr.cc
M be/src/runtime/buffered-block-mgr.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
6 files changed, 439 insertions(+), 432 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4898/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4442: Fix FE ParserTests UnsatisfiedLinkError
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1828504f79c51679f9ca07176bffbe248d450e87
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-07 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4863

to look at the new patch set (#10).

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..

IMPALA-2523: Make HdfsTableSink aware of clustered input

IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M bin/impala-config.sh
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_insert.py
M tests/query_test/test_insert_behaviour.py
M tests/util/filesystem_utils.py
14 files changed, 344 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..

IMPALA-3202: refactor scratch file management into TmpFileMgr

This is a pure refactoring patch that moves all of the logic
for allocating scratch file ranges into TmpFileMgr in anticipation of
this logic being used by the new BufferPool.

There should be no behavioural changes.

Also remove a bunch of TODOs that we're not going to fix.

Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
---
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-block-mgr.cc
M be/src/runtime/buffered-block-mgr.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
6 files changed, 441 insertions(+), 434 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/4898/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/buffered-block-mgr-test.cc
File be/src/runtime/buffered-block-mgr-test.cc:

PS1, Line 308: EXPECT_TRUE
> ASSERT_TRUE? Otherwise we will hit a NPE in DeleteBlocks().
Makes sense. This pattern of always using EXPECT is unfortunately everywhere in 
the unit tests. Mostly I think ASSERT makes the most sense. I just did a 
search-and-replace on this file to change it everywhere in this file where I 
could (it's only allowed in functions that return void).


http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/buffered-block-mgr.cc
File be/src/runtime/buffered-block-mgr.cc:

Line 246:   ->Init(state->io_mgr(), tmp_file_mgr, profile, parent, 
mem_limit, scratch_limit);
> This line break looks odd. If it was done by clang-format I'd keep it, but 
I agree that it looks weird. clang-format did it.


http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

Line 282:   DCHECK_EQ(tmp_files_.size(), 0);
> nit: DCHECK(tmp_files_.empty());
Done


PS1, Line 290: by ignoring the return status of
 : // NewFile().
> we don't seem to actually ignore it, the comment looks wrong.
True, made the comment more accurate.


PS1, Line 296: spilling
> "No scratch directories..."? We seem to use 'tmp', 'scratch', and 'spilling
Good point, users have been confused by this before when the mixed terminology 
leaked into error messages: IMPALA-3866.

We need to keep scratch as the user-facing term to match the command-line 
option, so I fixed the "spilling" term here.

We could rename TmpFileMgr, etc to ScratchFileMgr, I'm unsure if it's worth the 
code churn though - what do you think?


Line 335:   scratch_space_bytes_used_counter_->Add(num_bytes);
> can we handle current_bytes_allocated_ here, too? That's also the only dire
That works out cleaner, thanks.


Line 336:   return Status::OK();
> nit: this could just be "return status".
I prefer it this way since it's more obvious that it's an "successful" return 
path. I don't feel that strongly but I find this easier to parse.


Line 351: err_status.MergeStatus(errs[i]);
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/4898/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

PS1, Line 133:  
> nit: double space
Done


PS1, Line 136: query_id
> does this actually have to be the query_id or could we rename it to somethi
I don't think it will ever be anything asides from the query id, but maybe 
"unique_id" makes more sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4441: Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats

2016-11-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4441: Divide-by-zero in 
RuntimeProfile::SummaryStatsCounter::SetStats
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4975/1/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS1, Line 1034: DCHECK(total_num_values_ > 0);
I don't quite see the reason for this dcheck. If total_num_values_ becomes < 0, 
we should catch that at the point it happens (rather than here). So in 
SetStats(), I think you should check for total_num_values_ >= 0, and set it to 
0 - or just drop the update completely - if it's not. (Rather than DCHECK - 
because here we're dealing with input from an RPC I think, and it's better to 
be robust where possible in those situations).


-- 
To view, visit http://gerrit.cloudera.org:8080/4975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33f1e6fb45505dce7d79497d1632c5f63a409151
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This 
disjunction gets
> Agree. I think we really want a query option to disable these kind of "opti
How about we add a query test that tests || on a column with some NULLs in it 
so we're exercising the runtime code path?

I don't really like the fact that the rewrite changes the type of the 
expression but it seems mostly harmless - how hard is it to retain the original 
type of the expression though?


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This 
disjunction gets
> We're losing real test coverage here. Can you rewrite the test in a way tha
Agree. I think we really want a query option to disable these kind of 
"optimizing" rewrites.

Adding a test that circumvents the rewrite is going to be hard, i.e. we won't 
be able to go through SQL at all.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


Patch Set 4:

(9 comments)

Looks pretty close to me.

http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

PS4, Line 105: if (numClients > 0) {
 :   metaStoreClientPool_.addClients(1, initialCnxnTimeoutSec);
 :   metaStoreClientPool_.addClients(numClients - 1, 0);
 : }
I think you can make this logic a method of MetaStoreClientPool - it's already 
called from MSCP's constructor, so a chance for sharing one method between both 
palces.


http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 110:   private static final int META_STORE_CLIENT_POOL_SIZE = 10;
is this the max size, or the initial size, or both?


http://gerrit.cloudera.org:8080/#/c/4842/4/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java:

PS4, Line 76: private
Add a comment to this method describing the behaviour.


PS4, Line 83: IMetaStoreClient hiveClient = null;
What's the point of this variable - why not assign to hiveClient_ directly?


PS4, Line 99: catch (InterruptedException ignore) {}
on this path you might end up not sleeping for long enough if you get 
interrupted. Instead, suggest you have something roughly like:

  long delayUntil = System.currentTimeMillis() + retryDelayMillis;
  if (delayUntil > endTimeMillis) throw...
  while (delayUntil > System.currentTimeMillis()) {
try {
  Thread.sleep(delayUntil - System.currentTimeMillis());
} catch (...) { }
  }


PS4, Line 161: initialCnxnTimeoutSec
this is not the 'initial' timeout in the same sense that it is elsewhere, I 
think. Elsewhere we use it to mean "first client", but here it's used to mean 
"first connection". I think it's better just to call it cnxnTimeoutSec here.


http://gerrit.cloudera.org:8080/#/c/4842/4/tests/experiments/test_catalog_hms_failures.py
File tests/experiments/test_catalog_hms_failures.py:

PS4, Line 81: 10
just from experience, suggest you make this 30s. Timeouts are surprisingly 
flaky in EC2-based build infrastructure.


Line 114: 
Perhaps wait 5s or so here to be sure that the catalog is in the 'trying to 
connect' phase of its startup.


Line 122
How about a test that confirms that if the HMS does not start within 
initial_hms_cnxn_timeout_s, then the catalogd fails?


-- 
To view, visit http://gerrit.cloudera.org:8080/4842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83c70f939429e1d0d20284a1307f3ee1278ae047
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4863/9/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 488: import os
why?


Line 491: table_location = DEFAULT_FS + "/" + table_path
I'm confused. Doesn't get_fs_path() do the right thing? Why do we have to add 
this DEFAULT_FS prefix?


Line 550:   table, table_location)
indent 4


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage

2016-11-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-1169: Admission control info on the queries debug webpage
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through 
grouping agg + outer join.
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1426: hasOjTest
> Where is this function used?
Sorry, this was an experiment. Not needed. Removed.


PS1, Line 1558: evalByJoin
> This variable name is a bit confusing to me. Is "evalAfterJoin" better?
Done


http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

PS1, Line 851: public static  void removeDuplicates(List l)
> Suggest to refactor this function a bit like below (if possible):
Exprs are not hashable, i.e. do not implement hashCode(). Fixing that is beyond 
the scope of this change and also dangerous (wrong results, if not done 
properly).

I'd rather not change the API of this function, because that will force changes 
in many parts of the code which could introduce new bugs and make backporting 
more difficult.

We don't really need thread safety here, so I don't think we need to go out of 
out way to add it.


PS1, Line 853: List origList = Lists.newArrayList(l);
 : l.clear();
 : for (C expr: origList) if (!l.contains(expr)) l.add(expr);
> Would it be better to use a Set instead of List?
See comment above. Not hashable. Also a Set is not good because it does not 
preserve insertion order and may lead to tests with non-deterministic results.


-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #27
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through 
grouping agg + outer join.
..

IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + 
outer join.

Background: We generally allow the assignment of predicates below the nullable 
side
of a left/right outer join, explained as follows using an example:

SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id
WHERE t2.int_col < 10

The scan of 't2' will pick up 't2.int_col < 10' via 
Analyzer.getBoundPredicates()
and recognizes that the predicate must also be evaluated by a join later, so the
predicate is not marked as assigned. The join then picks up the unassigned
predicate via Analyzer.getUnassignedConjuncts().

The bug was that our logic for detecting whether a bound predicate must also be
evaluated at a join node was flawed because it only considered whether the 
tuples
of the source or destination predicate were outer joined (plus other 
conditions).
The underlying assumption is that either the source or destination tuple are 
bound
by a tuple produced by a TableRef, but in the buggy query the source predicate
is bound by an aggregation tuple, so we incorrectly marked the bound predicate
as assigned in Analyzer.getBoundPredicates().

The fix is to conservatively not mark bound predicates as assigned if there 
exists
equivalent tuples that are outer joined. This conservative fix leads to some
duplicate assignments of predicates. Those are simply deduped now.

Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
6 files changed, 48 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/4960/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #27


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> in this patch, let's focus on using the new api, rather than shrinking the 
How can we separate them though? The new API returns a UUID.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1169: Admission control info on the queries debug webpage

2016-11-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-1169: Admission control info on the queries debug webpage
..


Patch Set 9:

The reason the GVO failed was the change in this patch that always calls 
QuerySchedule::set_is_admitted(true) for query schedules that don't go through 
AdmissionController::AdmitQuery. This was causing 
AdmissionController::ReleaseQuery to try to release queries that had never been 
subject to admission control, resulting in a crash.

Originally, this was fine because the only condition where AdmitQuery wouldn't 
be called was when FLAG_disable_admission_control was false, which would also 
prevent ReleaseQuery from being called. However, a recent change added 
'is_mt_execution' as a condition for skipping AdmitQuery but not as a condition 
for skipping ReleaseQuery, leading to the crash.

Also originally, setting is_admitted to true when 
FLAG_disable_admission_control to true was necessary as we were relying on 
QuerySchedule::is_admitted to differentiate queries that were not 
scheduled/queued/etc, even when admission control is off. With the change to 
just using the query event log for differentiating not scheduled/queued/etc. 
states, this is no longer necessary, so to fix the crash I just removed it.

-- 
To view, visit http://gerrit.cloudera.org:8080/4756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I504e3c829a14318721e3a42de6281bcc578f7283
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.

2016-11-07 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Minor fixes to remove more "cloudera"s from the code.
..


Patch Set 2: Code-Review+2

Carry Tim's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Minor fixes to remove more "cloudera"s from the code.

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Minor fixes to remove more "cloudera"s from the code.
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27687a3b677def72f1867df54c8e50c93d5cab3a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..


Patch Set 10: Code-Review+2

(1 comment)

Carry +2

http://gerrit.cloudera.org:8080/#/c/4838/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1030:   // significant bits overlap in memory.
> I guess these should really have the LITTLE_ENDIAN checks too, right?
Reworded the comment to be less little-endian specific.

I think if we were to port this to big-endian, we'd add padding to DecimalVal 
so that the least significant bits lined up (which means the field offsets 
would be different). I don't think this code would have to change.


-- 
To view, visit http://gerrit.cloudera.org:8080/4838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored
..


Patch Set 2:

I was going to poll a few people but this got bumped down my priority list a 
bit - will come back to it probably in a week or so.

-- 
To view, visit http://gerrit.cloudera.org:8080/4878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3200 (buffer pool): warn if Status is ignored

2016-11-07 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3200 (buffer pool): warn if Status is ignored
..


Patch Set 2:

> (1 comment)

Any updates no this?

-- 
To view, visit http://gerrit.cloudera.org:8080/4878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c26e7532b5f2c7fe167accc73179e8b72b192bc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4838/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1030:   // of the val16 should be initialized to all 0's.
I guess these should really have the LITTLE_ENDIAN checks too, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/4838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..


Patch Set 8:

Ran into a bug in the patch where histogram was interpreting a val4 or val8 as 
a val16, and seeing uninitialised bits. This seems to be an expected way to use 
DecimalVal in the UDF interface.

The fix is to memset() the whole value to 0 when allocating it. Documented this 
behaviour a bit better.

-- 
To view, visit http://gerrit.cloudera.org:8080/4838
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4302,IMPALA-2379: constant expr arg fixes

2016-11-07 Thread Tim Armstrong (Code Review)
Hello Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4838

to look at the new patch set (#7).

Change subject: IMPALA-4302,IMPALA-2379: constant expr arg fixes
..

IMPALA-4302,IMPALA-2379: constant expr arg fixes

This patch fixes two issues around handling of constant expr args.
The patches are combined because they touch some of the same code
and depend on some of the same memory management cleanup.

First, it fixes IMPALA-2379, where constant expr args were not visible
to UDAFs. The issue is that the input exprs need to be opened before
calling the UDAF Init() function.

Second, it avoids overhead from repeated evaluation of constant
arguments for ScalarFnCall expressions on both the codegen'd and
interpreted paths. A common example is an IN predicate with a
long list of constant values.

The interpreted path was inefficient because it always evaluated all
children expressions. Instead in this patch constant args are
evaluated once and cached. The memory management of the AnyVal*
objects was somewhat nebulous - adjusted it so that they're allocated
from ExprContext::mem_pool_, which has the correct lifetime.

The codegen'd path was inefficient only with varargs - with fixed
arguments the LLVM optimiser is able to infer after inlining that
the expressions are constant and remove all evaluation. However,
for varargs it stores the vararg values into a heap-allocated buffer.
The LLVM optimiser is unable to remove these stores because they
have a side-effect that is visible to code outside the function.

The codegen'd path is improved by evaluating varargs into an automatic
buffer that can be optimised out. We also make a small related change
to bake the string constants into the codegen'd code.

Testing:
Ran exhaustive build.

Added regression test for IMPALA-2379 and MemPool test for aligned
allocation. Added a test for in predicates with constant strings.

Perf:
Added a targeted query that demonstrates the improvement. Also manually
validated the non-codegend perf. Also ran TPC-H and targeted perf
queries locally - didn't see any significant changes.

++---+---++-++---++-+---+
| Workload   | Query | File Format   | 
Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | 
Iters |
++---+---++-++---++-+---+
| TARGETED-PERF(_20) | primitive_filter_in_predicate | parquet / none / none | 
1.19   | 9.82| I -87.85%  |   3.82%   |   0.71%| 1   | 
10|
++---+---++-++---++-+---+

(I) Improvement: TARGETED-PERF(_20) primitive_filter_in_predicate [parquet / 
none / none] (9.82s -> 1.19s [-87.85%])
+--++--+--++---+--+--++++---+
| Operator | % of Query | Avg  | Base Avg | Delta(Avg) | StdDev(%) | 
Max  | Base Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
+--++--+--++---+--+--++++---+
| 01:AGGREGATE | 14.39% | 155.88ms | 214.61ms | -27.37%|   2.68%   | 
163.38ms | 227.53ms | -28.19%| 1  | 1  | 1 |
| 00:SCAN HDFS | 85.60% | 927.46ms | 9.43s| -90.16%|   4.49%   | 
1.01s| 9.50s| -89.42%| 1  | 13.77K | 14.05K|
+--++--+--++---+--+--++++---+

Change-Id: I45c3ed8c9d7a61e94a9b9d6c316e8a53d9ff6c24
---
M be/src/benchmarks/in-predicate-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/case-expr.cc
M be/src/exprs/expr-context.h
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/in-predicate.h
M be/src/exprs/literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/utility-functions-ir.cc
M be/src/runtime/free-pool-test.cc
M be/src/runtime/free-pool.h
M be/src/runtime/mem-pool-test.cc
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/testutil/test-udas.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M 

[Impala-ASF-CR](asf-site) Initial commit of the blog section of the Impala ASF website.

2016-11-07 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Initial commit of the blog section of the Impala ASF website.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4944/1/pelicanconf.py
File pelicanconf.py:

Line 1: #!/usr/bin/env python
> I wasn't sure what to do about this. I noticed that while some of our third
The licenses in the master branch are spelled out in 
https://github.com/apache/incubator-impala/blob/master/LICENSE.txt

I suspect AGPL would prevent us form using Pelican unless we author all of the 
templates ourselves:

https://www.apache.org/legal/resolved

We had to take to comments out of our Doxygen file, for instance, because they 
were included in Doxygen source files and not given an Apache-compatible 
license.

Does Pelican license the templates differently?


-- 
To view, visit http://gerrit.cloudera.org:8080/4944
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa578a70237fcc97589c667c17a70d3d6dad5ae1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.

2016-11-07 Thread Anonymous Coward (Code Review)
Anonymous Coward #27 has posted comments on this change.

Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through 
grouping agg + outer join.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1426: hasOjTest
Where is this function used?


http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

PS1, Line 851: public static  void removeDuplicates(List l)
Suggest to refactor this function a bit like below (if possible):

public static  List removeDuplicates(List l) {
  if (l == null) {
return l;
  } else {
return Lists.newArrayList(Sets.newHashSet(l));
  }
}

This makes this function thread-safe. The passed in list won't be modified, and 
thus can be safely shared among threads.


PS1, Line 853: List origList = Lists.newArrayList(l);
 : l.clear();
 : for (C expr: origList) if (!l.contains(expr)) l.add(expr);
Would it be better to use a Set instead of List?
Set s = Sets.newHashSet(l);
l.clear();
l.addAll(s);


-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #27
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> Agree, that's even better. We need to group the UUIDs by host and then assi
in this patch, let's focus on using the new api, rather than shrinking the disk 
ids down to their minimum size. that's probably best handled in a follow-on.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4437: hit DCHECK in buffered-block-mgr-test

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4437: hit DCHECK in buffered-block-mgr-test
..


IMPALA-4437: hit DCHECK in buffered-block-mgr-test

The issue was introduced by "IMPALA-3202: DiskIoMgr improvements
for new buffer pool". The bug is pretty straightforward - I forgot
to update some error/cancellation paths to handle client-provided
buffers.

Testing:
Was able to reproduce locally before the fix, have run it in a loop for
a while with the fix and not seen any crashes.

Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545
Reviewed-on: http://gerrit.cloudera.org:8080/4973
Reviewed-by: Henry Robinson 
Tested-by: Internal Jenkins
---
M be/src/runtime/disk-io-mgr.cc
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Henry Robinson: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4437: hit DCHECK in buffered-block-mgr-test

2016-11-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4437: hit DCHECK in buffered-block-mgr-test
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7e8ffdbec341f6ffbc310fb29df63ab45fed545
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No