[Impala-ASF-CR] DRAFT PREVIEW: Prereqs for load test system testing

2017-01-31 Thread Harrison Sheinblatt (Code Review)
Harrison Sheinblatt has abandoned this change.

Change subject: DRAFT PREVIEW: Prereqs for load test system testing
..


Abandoned

New one submitted

-- 
To view, visit http://gerrit.cloudera.org:8080/3785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ibd330d8a19ae5d968b669a4258cf826d0db28cdd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Harrison Sheinblatt 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


IMPALA-4808: old hash join can reference invalid memory

The bug was that 'probe_rows_exist' could be true even if
there was no current probe row. The node can get into this
state if it takes the branch at line 390.

I tried to reproduce the crash but was unable to after a
few attempts.

Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Reviewed-on: http://gerrit.cloudera.org:8080/5850
Reviewed-by: Matthew Jacobs 
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hash-join-node.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input

2017-01-31 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record 
input
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5800/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

PS1, Line 26: SELECT variance(tinyint_col), 
stddev(smallint_col),stddev_samp(smallint_col),
: variance_pop(int_col),
> please add a col for stddev_samp as well. even though they're aliased now, 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input

2017-01-31 Thread anujphadke (Code Review)
anujphadke has uploaded a new patch set (#2).

Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record 
input
..

IMPALA-4738: STDDEV_SAMP should return NULL for single record input

In calculating the STDDEV_SAMP of N rows a divion by N-1 rows is involved.
Hence STDDEV_SAMP for a single row  involves a division by 0. This change
returns a NULL instead of a 0 when calculating STDDEV_SAMP for a single row.
STDDEV_POP for single row will still return a 0 since this does not
involve  a division by 0.

Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe
---
M be/src/exprs/aggregate-functions-ir.cc
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
2 files changed, 7 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/5800/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 9:

Michael, yeah, I will fix those.  I still need to plug in the additional tests 
that will be required now that we do constant based stuff.

Dan - replace ' ' -> '' was the initial test.  This was much faster, 2x so.  
Regexp compilation starts to be more competitive on longer expressions because 
it has more advanced state.  Still it can't beat a cached StringSearch - 
'complicated' actually isn't that complicated, but it (due to the repeated 'c') 
allows the search to advance faster, better illustrating the difference on 
longer strings.

-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen

2017-01-31 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with 
codegen
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5732/4//COMMIT_MSG
Commit Message:

Line 50: defined in the boost library.
is there some verification code we can add to ensure that this problem won't 
happen again (or isn't happening elsewhere)?


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS4, Line 326: which are also defined in the main module
this doesn't seem to match what the code is doing.  Isn't it materializing 
functions that are referenced by functions of the new module?

But why is this needed? why don't they get materialized when we follow the 
chain of references in MaterializeFunctionHelper()?


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 85: their callee functions
what is a "callee" for a global variable? do you mean reference?
If so, how about:

// Map from global variables/functions to the functions that they reference.


PS4, Line 86: CalleeMap
again, callee sounds to me like only functions would be keys in this map.  
FunctionRefsMap?


PS4, Line 160: f 'eager' is false, the
 :   /// functions in the module are not materialized.
this seems to contradict the class header comment that says "llvm::Function 
objects in the module are materialized lazily". i.e. the behavior you are 
documented is the expected behavior, no?

So are you adding 'eager' so that the default behavior can be overridden? if 
so, it should say something like: 

If 'eager' is true, the functions are materalized. Otherwise they are 
materialized lazily via GetFunction().

And if that's what's going on, why not call the parameter 'materialize"?

But, as a caller, how do I choose whether I'm suppose to materialize or not at 
this point? The need for this seems to contradict the whole explanation about 
lazy materialization in the header comment.

Finally, rather than threading through this parameter, why not just have the 
caller do module->MaterializeAll() with the returned module?


PS4, Line 617:  which also need to be materialized if
 :   /// the input global variable or function is materialized
is this really specific to materialization? wouldn't it be enough to just say:

... all the functions referenced by it.

The materialization code itself can explain (or be self documenting) about why 
references need to be followed when materializing.


PS4, Line 625: fns_to_materialize_
why this name change?  does this now store functions that need to be 
materialized for various reasons? don't we materialize functions that aren't in 
this set?


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

since this takes so long to execute, should we either move expr-test.cc to run 
only in exhaustive, or run only codegen tests when exhaustive?


-- 
To view, visit http://gerrit.cloudera.org:8080/5732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/224/

-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#4).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants

2017-01-31 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4809: Add Stability for codegen constants
..


Patch Set 1:

IMPALA-4809 isn't about inlining an expression related value, it's just that 
the way the code is currently factored, Expr routines deal with doing codegen 
tricks involving FunctionContext's.  They really don't belong in Expr in the 
first place.  Michael is in the process of cleaning that up.  So, let's not 
make IMPALA-4809 dependent on this change.

I agree a concept like this may be useful and needed in the future, but until 
we finish cleaning up some surrounding code and have a need for this, I don't 
think we should introduce this new concept (and it seems more of a codegen 
related concept than an Expr related concept).

As an aside, see Codegen::ReplaceCallSites() and other related code for 
non-expr places we do constant injection.  That's another way you could do 
4809, but ultimately you'll need to get at the query options through 
FunctionContext anyway.

-- 
To view, visit http://gerrit.cloudera.org:8080/5848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 9:

What do you mean by "complicated" queries?
And how much did computing the StringSearch only once help?  From your numbers 
below, REPLACE is not much faster than REGEX_REPLACE, but I thought earlier you 
were seeing a 2x speedup.

-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 9:

Added support for saving StringSearcher for complicated queries.

***
[localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) 
from tpch.lineitem;
Query: select count(regexp_replace(l_comment, 'complicated', '')) from 
tpch.lineitem
Query submitted at: 2017-02-01 03:21:19 (Coordinator: http://impala-dev:25000)
Query progress can be monitored at: 
http://impala-dev:25000/query_plan?query_id=a49fad015129539:afa30da
+-+
| count(regexp_replace(l_comment, 'complicated', '')) |
+-+
| 6001215 |
+-+
Fetched 1 row(s) in 7.55s
[localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) 
from tpch.lineitem;
Query: select count(regexp_replace(l_comment, 'complicated', '')) from 
tpch.lineitem
Query submitted at: 2017-02-01 03:21:31 (Coordinator: http://impala-dev:25000)
Query progress can be monitored at: 
http://impala-dev:25000/query_plan?query_id=3946a55a95c4529f:67394350
+-+
| count(regexp_replace(l_comment, 'complicated', '')) |
+-+
| 6001215 |
+-+
Fetched 1 row(s) in 1.54s
[localhost:21000] > select count(regexp_replace(l_comment, 'complicated', '')) 
from tpch.lineitem;
Query: select count(regexp_replace(l_comment, 'complicated', '')) from 
tpch.lineitem
Query submitted at: 2017-02-01 03:21:35 (Coordinator: http://impala-dev:25000)
Query progress can be monitored at: 
http://impala-dev:25000/query_plan?query_id=3f41480452d08ce1:21ebac35
+-+
| count(regexp_replace(l_comment, 'complicated', '')) |
+-+
| 6001215 |
+-+
Fetched 1 row(s) in 1.53s
[localhost:21000] > select count(replace(l_comment, 'complicated', '')) from 
tpch.lineitem;
Query: select count(replace(l_comment, 'complicated', '')) from tpch.lineitem
Query submitted at: 2017-02-01 03:21:44 (Coordinator: http://impala-dev:25000)
Query progress can be monitored at: 
http://impala-dev:25000/query_plan?query_id=d24d13228c8c8833:110700e3
+--+
| count(replace(l_comment, 'complicated', '')) |
+--+
| 6001215  |
+--+
Fetched 1 row(s) in 1.33s
[localhost:21000] > select count(replace(l_comment, 'complicated', '')) from 
tpch.lineitem;
Query: select count(replace(l_comment, 'complicated', '')) from tpch.lineitem
Query submitted at: 2017-02-01 03:21:48 (Coordinator: http://impala-dev:25000)
Query progress can be monitored at: 
http://impala-dev:25000/query_plan?query_id=ea4d6cd631e38ce2:75984bb6

-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-4729: Implement REPLACE()
..

IMPALA-4729: Implement REPLACE()

This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.

Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/gutil/bits.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
9 files changed, 201 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] Prereqs for load test system testing

2017-01-31 Thread Harrison Sheinblatt (Code Review)
Harrison Sheinblatt has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5851

Change subject: Prereqs for load test system testing
..

Prereqs for load test system testing

Additional py.test parameters to control load testing, include
externally linked  test modules in python path, add CmService
methods to query timeseries and Impala query data from CM if
CM is part of SUT, as well as update admission control
configuration parameters.

Also added a dependency for pyjavaproperties and updated the
version of python_dateutil to the virtual env.

Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab
---
M bin/impala-python-common.sh
M bin/set-pythonpath.sh
M infra/python/deps/requirements.txt
M tests/comparison/cluster.py
M tests/conftest.py
5 files changed, 649 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5851/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Harrison Sheinblatt 


[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4809: Add Stability for codegen constants
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG
Commit Message:

PS1, Line 16: overloaded
> I think this would be useful once we want to do codegen caching and need to
Good point about its usability for codegen caching but it seems a bit premature 
at this point IMHO.


http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS1, Line 266: RUNTIME_CONSTANT
> I think it would make sense to move this into a thrift enum - see my other 
If this is done solely for codegen, it may make sense to move this to codegen 
related header. It appears to me the type of constant we are talking about is 
more like CpuInfo::IsSupported(CpuInfo::SSE4_2) in cross-compiled code.


-- 
To view, visit http://gerrit.cloudera.org:8080/5848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_incompatible_changes.xml
File docs/topics/impala_incompatible_changes.xml:

Line 63:   
You can remove this. It was only a temporary bug in 2.8. We never shipped a 
version that had DESCRIBE FORMATTED in uppercase.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

2017-01-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend 
JVMs
..


Patch Set 3:

@reviewers: Please don't review PS2/3. My next PS includes a bunch of new 
stuff. I'll send it out for review soon.

-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants

2017-01-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4809: Add Stability for codegen constants
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG
Commit Message:

PS1, Line 16: overloaded
> From the codegen perspective, we only care if a value is compilation time c
I think this would be useful once we want to do codegen caching and need to 
reason about whether codegen output is re-usable.

Agree we can solve IMPALA-4809 with Expr::GetConstant() or the mechanism in 
this patch (if that ever makes it in):
https://gerrit.cloudera.org/#/c/3843/


http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 266: RUNTIME_CONSTANT, // Does not change over an execution
Concretely, does this mean that it doesn't change within the scope of a process 
- it can be different if you start another daemon or are on a different system?


PS1, Line 266: RUNTIME_CONSTANT
> This seems odd to be in Expr class if an example of this class of constants
I think it would make sense to move this into a thrift enum - see my other 
comment about doing the analysis in the FE.


Line 267: QUERY_CONSTANT, // Does not change over a single query
We may also want to think about how it varies across different Impala daemons. 
E.g. does QUERY_CONSTANT mean it's the same on all Impala daemons for that 
query, or just on the current daemon?

I think that is effectively a different dimension. I don't think we necessarily 
need to add more complexity, just that we should document the assumptions.

E.g. one thing we've thought about hypothetically is compiling on one daemon 
then distributing it others.


-- 
To view, visit http://gerrit.cloudera.org:8080/5848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-01-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 3:

(3 comments)

regarding testing, let's discuss that.

http://gerrit.cloudera.org:8080/#/c/5768/3/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java:

Line 393: long numOutputConstants = 0;
int is fine here, if you overflow that we have other problems.


Line 402:   if ((i - loopStart) % 2 == 1 || (i == children_.size() - 1 && 
hasElseExpr_)) {
check the negation and continue (to save one indentation level).


Line 409: if (constLiteralSet.add(outputLiteral)) 
++numOutputConstants;
nice optimization. is there a test for that?


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

2017-01-31 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5640/7/tests/run-tests.py
File tests/run-tests.py:

PS7, Line 144: other_tests
> Maybe use the name explicit_tests ?
Done. (And, with hindsight, changed 'params' to 'config_options'.)


-- 
To view, visit http://gerrit.cloudera.org:8080/5640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

2017-01-31 Thread David Knupp (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5640

to look at the new patch set (#8).

Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
..

IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

The current version of pytest in the Impala python environment is
quite old (2.7.2) and there have been bug fixes in later versions
that we could benefit from.

Also, since the passing of params to pytest.main() as a string will
be deprecated in upcoming versions of pytest, edit run-tests.py to
instead pass params as a list. (This also means we don't need to
worry about esoteric bash limitations re: single quotes in strings.)

While working on this file, the filtering of commandline args when
running the verfier tests was made a little more robust.

Tested by doing a standard (non-exhaustive) test run on centos 6.4
and ubuntu 14.04, plus an exhaustive test run on RHEL7.

Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
---
M infra/python/deps/requirements.txt
M tests/run-tests.py
2 files changed, 82 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5640/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4809: Add Stability for codegen constants
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5848/1//COMMIT_MSG
Commit Message:

PS1, Line 16: overloaded
>From the codegen perspective, we only care if a value is compilation time 
>constant (compilation in the LlvmCodeGen::FinalizeModule()). Not entirely 
>clear to me the benefit of distinguishing between these various types of 
>constant values ? I could be missing something.


PS1, Line 17: is_constant()
I believe this is a concept confined to Expr only. This is a different notion 
from say other runtime constants such as the constants in a hash table (e.g. 
stores_duplicates).


http://gerrit.cloudera.org:8080/#/c/5848/1/be/src/exprs/expr.h
File be/src/exprs/expr.h:

PS1, Line 266: RUNTIME_CONSTANT
This seems odd to be in Expr class if an example of this class of constants is 
CPUInfo's flags.


-- 
To view, visit http://gerrit.cloudera.org:8080/5848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


Patch Set 1: Code-Review+1

Yeah, I guess it wasn't harmless after all. I think it makes sense to remove 
it, thanks.

-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4808: old hash join can reference invalid memory
..


Patch Set 1:

It looks like on the original CR there was some discussion about whether this 
was right/wrong https://gerrit.cloudera.org/#/c/1068/

-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4808: old hash join can reference invalid memory

2017-01-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5850

Change subject: IMPALA-4808: old hash join can reference invalid memory
..

IMPALA-4808: old hash join can reference invalid memory

The bug was that 'probe_rows_exist' could be true even if
there was no current probe row. The node can get into this
state if it takes the branch at line 390.

I tried to reproduce the crash but was unable to after a
few attempts.

Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
---
M be/src/exec/hash-join-node.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/5850/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic068bbc3e029264d1ce814d286e372391639fa68
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4809: Add Stability for codegen constants

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5848

Change subject: IMPALA-4809: Add Stability for codegen constants
..

IMPALA-4809: Add Stability for codegen constants

Having two boolean parameters to a constructor is error prone, so
I thought instead we could express the 'constance' of values by
using an enum class.  I chose 'Stability' as a name since one can
imagine extending this to other semi-stable things in a spectrum
between Literal expressed constant, runtime constant, query-local
constant, memoizable constexpr, dynamic variable.

Right now those terms may be overloaded into a single notion in the
sense of is_constant() but they are quite distinct concepts.  For
example, CPU flags are RUNTIME_CONSTANT, while query options are
QUERY_CONSTANT.

Testing: Going to run standard test suite.  There should be no
side effects at all from this change.

Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
---
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/slot-ref.cc
5 files changed, 38 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5848/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0431bcd84b908bcf130a6d618f57f8f7c5498428
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5776/8/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS8, Line 236: (UNLIKELY(result.is_null))
May help to add a comment stating that result can be null if buffer_space 
exceeds StringVal::MAX_LENGTH at this point.


PS8, Line 242: bytes
may be this can use a more meaningful name such as unmatched_len.


PS8, Line 257: check_space)
We want to skip this doubling behavior if bytes_remaining == 0.

The code could be slightly simpler if we avoid having check_space and directly 
check the size of the buffer_space (like line 260). The branch should be pretty 
predictable for the case in which the replacement string is shorter than the 
pattern string.


PS8, Line 260: replace.len
nit: replace.len - pattern.len ?


PS8, Line 265:  std::numeric_limits::max() / 2);
nit: indent 4.


PS8, Line 267:  ptr - result.ptr
Isn't this bytes_produced ?


PS8, Line 268: if (UNLIKELY(!result.Resize(context, buffer_space))) {
A one line comment about this already handles case in which the buffer_space 
exceeds StringVal::MAX_LENGTH ?


PS8, Line 282: remaining_bytes
'bytes_remaining' for consistency with the code above.


PS8, Line 289: context->Reallocate
As discussed in person, this will not shrink the buffer given the way 
FreePool::Reallocate() is implemented.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4359: qgen: add UPSERT support
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/model_translator.py
File tests/comparison/model_translator.py:

PS1, Line 534: 1)
Why 1 here? Are we expecting there to be several "INSERT" keywords in the query 
sql?


http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/query.py
File tests/comparison/query.py:

Line 705:   (CONFLICT_ACTION_DEFAULT,
I see that you explained what these mean below, but it's not to perfectly clear 
to me. Maybe explain what these mean exactly here too?

Conflict is when there is a row with the same primary key as you're trying to 
insert, right? Ignore means insert anyways (so duplicate the primary key?). 
What is default exactly?


PS1, Line 727: ng a K
inserting into a Kudu table


http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/statement_generator.py
File tests/comparison/statement_generator.py:

Line 70: if dml_table.primary_keys:
Add a comment that if the table has primary keys then it's a Kudu table and 
Kudu supports UPSERT. (Otherwise upsert is not generated because it's not 
supported)


-- 
To view, visit http://gerrit.cloudera.org:8080/5795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Improvements to "Unknown disk-ID" warning

2017-01-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Improvements to "Unknown disk-ID" warning
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5828/1//COMMIT_MSG
Commit Message:

Line 7: Improvements to "Unknown disk-ID" warning
IMPALA-1427?


Line 9: This commit,
This patch seems fine. 

Another idea to consider:
I'm not really sure how useful it is to return these missing disk id warnings 
from scans. An alternative might be to report the number of missing disk ids in 
the explain plan of a query and throw away this runtime checking and warning 
through the existing path.


-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-31 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 13:

(8 comments)

New patch set coming any second.

http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

PS13, Line 64: ADD [IF NOT EXISTS] | DROP [IF EXISTS] } PARTITION
Add another line here for ADD RANGE PARTITION.

(The ADD/DROP lines are already getting split apart as part of a different code 
review. That will happen here as part of resolving a merge conflict.)


PS13, Line 86: kudu_partition_spec
Take this out of there because it doesn't apply to the basic ADD PARTITION and 
DROP PARTITION clauses.


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS13, Line 85: [PRIMARY KEY (col_name[, ...]]
> This doesn't belong here.
Done


PS13, Line 234: codeblock
Sprinkle some rev="kudu" eyecatchers on these Kudu-only codeblocks.


PS13, Line 244: CREATE TABLE [IF NOT EXISTS] 
db_name.]table_name
  :   [COMMENT 'table_comment']
  :   STORED AS KUDU
  :   [TBLPROPERTIES 
('key1'='value1', 
'key2'='value2', ...)]
  : AS
  :   select_statement
> This is not correct. You're missing the PRIMARY KEY and PARTITION BY clause
Done


PS13, Line 411: impala::username.kudu_t1
> nit: I don't think this is a good example. I would be really surprised if s
Done


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS13, Line 93: kudu.master_addresses
> That's the name of the table property. Can you verify that name of the impa
Done. Right, the config option is -kudu_master_hosts


PS13, Line 101: kudu.master_addresses
Same change to -kudu_master_hosts as in previous paragraph. Clean up the 
TBLPROPERTIES wording in this paragraph to reflect better the 
kudu.master_addresses param for that clause.


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-01-31 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5840

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Neither Impala nor the Kudu client
validates the schema immediately before reading, so Impala
may attempt to dereference pointers that aren't there. This
happens if a string column is dropped and then a new, non
string column is added with the old string column's name.

This change adds validation after opening a scan token to
ensure the projection schema matches the expected schema.
The scan is guaranteed to be valid while the KuduScanner is
open, even if the schema changes concurrently.

Testing: Adds a test case that previously would crash Impala.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/runtime/descriptors.h
M common/thrift/generate_error_codes.py
M tests/query_test/test_kudu.py
7 files changed, 99 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has submitted this change and it was merged.

Change subject: Remove "Git repository" from navigation bar.
..


Remove "Git repository" from navigation bar.

It is not in the navigation bar of any other page; it remained here
only by an oversight.

Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6
Reviewed-on: http://gerrit.cloudera.org:8080/5785
Reviewed-by: Jim Apple 
Tested-by: Jim Apple 
---
M downloads.html
1 file changed, 0 insertions(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Remove "Git repository" from navigation bar.
..


Patch Set 2: Code-Review+2 Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 425 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

PS2, Line 122: IsInSubplan()
> I think we would either pass child_idx_ into isPassThrough() or call it som
Done


Line 123: 
> Move this into the Open() branch so we don't execute it unnecessarily.
Done


Line 126:   DCHECK(!IsInSubplan());
> We need to think about the implications of this. This could increase resour
Done. Disabled passthrough in subplans. The last row batch of each child is 
marked with the DeepCopy flag.


http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS2, Line 438: Verify
> Verify (caps)
Done


PS2, Line 442: >
> Make these references with & to avoid copying the vector.
Done


PS2, Line 442: const
> Don't need std::, we automatically import it in common/names.h.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-01-31 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 425 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR](asf-site) Remove "Git repository" from navigation bar.

2017-01-31 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: Remove "Git repository" from navigation bar.
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e6f612fdd49d8f826efeb01fb05f9334da2fbf6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
..


IMPALA-3989: Display skew warning for poorly formatted Parquet files

Parquet files are scanned in the granularity of row groups. Each row
group belongs to one or more splits and each split is scanned by a
separate parquet scanner.

If some row groups span multiple splits, then we will most likely end
up seeing some scanners having remote reads and some scanners not
performing scans at all. This will attribute to skew across the
cluster where distribution of scans is uneven.

This change adds a counter (NumScannersWithNoReads) to the scan node's
runtime profile to track the number of parquet scanners that end up
doing no reads becuse of poor formatting.

Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Reviewed-on: http://gerrit.cloudera.org:8080/5400
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M tests/query_test/test_scanners.py
3 files changed, 107 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-01-31 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5663/1//COMMIT_MSG
Commit Message:

Line 22: I marked with rev="upstream" some  tags
> I don't see upstream in any ditaval file. Where is it defined?
It's a purely user-defined attribute that we use for bookkeeping purposes, e.g. 
we use rev="IMPALA-abcd" to indicate the relevant JIRA associated with a 
change, and rev="upstream" to mark spots requiring future cleanup as part of 
the Apache cleanup.


http://gerrit.cloudera.org:8080/#/c/5663/1/docs/topics/impala_tutorial.xml
File docs/topics/impala_tutorial.xml:

Line 60: If you already have a  
environment set up and just need to add Impala to it, follow the installation
> This doesn't seem to work. It's showing up as "CDH" in the build.
Done


PS1, Line 698: from the sample TPC-DS kit for Impala.
> The only one of these is Cloudera's and it was very broken last I checked.
It may be that this section has to get removed due to bitrot on the TPC-DS 
stuff. If so, I'll suggest we do that in a subsequent CR and restrict ourselves 
to getting rid of explicit Cloudera / CDH references in this one.


PS1, Line 1700: (Currently, this technique only works for Parquet files.)
> Parentheses not necessary.
In this case, the parens are like a short form ot the "Note: this technique 
doesn't work for non-Parquet files" as we might do it in another section. I 
relax some of the rules a little bit (allowing royal 'we' etc.) in 
tutorial-style content.


-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-01-31 Thread John Russell (Code Review)
Hello Laurel Hale,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5663

to look at the new patch set (#2).

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..

Patch references to Cloudera and CDH in Impala tutorial

There was one tutorial that actually ran under the
'cloudera' user and so repeated that name over and
over in directory and HDFS paths. I switched that to
'username'.

I suppressed some  and  tags with Cloudera
Manager-specific details. Will physically remove those
from the source in a subsequent iteration.

I left several instances of audience="Cloudera" because
those will be changed to audience="hidden" as part of
a separate change request.

I marked with rev="upstream" some  tags
containing impala-shell banners with a Cloudera copyright
statement. Will decide on a convention to handle those
(elide those lines, or use a conref to consistently
substitute the generic equivalent) and do that in a
followup patch set.

Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_tutorial.xml
2 files changed, 26 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/5663/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..


[DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.

Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Reviewed-on: http://gerrit.cloudera.org:8080/5835
Reviewed-by: Lars Volker 
Reviewed-by: John Russell 
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, but someone else must approve
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/33/

-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..


Patch Set 1: Code-Review+2

I'll finalize this minor change.

-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5668/3/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

PS3, Line 100: IMPALA-2522
This should be IMPALA-2521.  IMPALA-2522 is an Epic that covers much more than 
just this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-4729: Implement REPLACE()
..

IMPALA-4729: Implement REPLACE()

This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.

Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/gutil/bits.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
9 files changed, 181 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-4729: Implement REPLACE()
..

IMPALA-4729: Implement REPLACE()

This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.

Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/gutil/bits.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
9 files changed, 180 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

2017-01-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend 
JVMs
..

IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

Very often we have to change the log4j logging levels
of Impalads and Catalog server for debugging purposes.
Currently, there is no way to do that without a restart
of the daemons, which is not a viable option in production
deployments.

This patch addresses this supportability gap by exposing
the ability to set dynamic log4j logging levels using
a simple web endpoint on Impalad and Catalog web pages.

Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
---
M be/src/catalog/catalog-server.cc
M be/src/service/impala-http-handler.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/logging-support.h
M common/thrift/Logging.thrift
M fe/src/main/java/org/apache/impala/util/GlogAppender.java
A www/log_level.tmpl
9 files changed, 226 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"

2017-01-31 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW"
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

2017-01-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend 
JVMs
..

IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

Very often we have to change the log4j logging levels
of Impalads and Catalog server for debugging purposes.
Currently, there is no way to do that without a restart
of the daemons, which is not a viable option in production
deployments.

This patch addresses this supportability gap by exposing
the ability to set dynamic log4j logging levels using
a simple web endpoint on Impalad and Catalog web pages.

Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
---
M be/src/catalog/catalog-server.cc
M be/src/service/impala-http-handler.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/logging-support.h
M common/thrift/Logging.thrift
M fe/src/main/java/org/apache/impala/util/GlogAppender.java
A www/log_level.tmpl
9 files changed, 228 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

2017-01-31 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend 
JVMs
..


Patch Set 1:

(6 comments)

Thanks Alex for the reviews. Very helpful. I agree this needs some discussion 
on usability aspect. PS2 made some changes.

1. Changed the title to "Java log level (log4j)" to make it look more 
non-technical.
2. The log level input is now a drop down to make things more clearer to the 
end user.
3. It now has an example in the text box itself. Looks like this [1]
4. Regarding point (6), I'm not totally sure if we should something about it, 
I've added a comment in the CR. May be we can discuss it further.
5. Still working on a single button to restore log levels.

@Henry: Still looking into the glog dynamic changes. Looks like it is possible 
and is as simple as FLAGS_v = , but I'm still digging into it.

[1] http://www.dumpt.com/img/viewer.php?file=8undlklgs7wt9q939hq9.png

http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc
File be/src/util/logging-support.cc:

Line 26: #include "rpc/jni-thrift-util.h"
> move before common/names.h
Moved. For my understanding, whats special about names.h include?


http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.h
File be/src/util/logging-support.h:

Line 21: #include "rapidjson/rapidjson.h"
> I think we should be able to forward declare instead of include.
Done. This one has some nested typedefs with some dependencies. Let me know if 
you think we should still retain it.


http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift
File common/thrift/Logging.thrift:

Line 34: // Helper structs for GetLogLevel and SetLogLevel methods
> GetLogLevel() and SetLogLevel()
Done


Line 35: struct TGetLogLevelParams {
> Instead of this GetLogLevel() I think we should instead list all custom log
Per my understanding log4j doesn't provide any easy way to get that list. So we 
need to manually track all the classes that were overriden via web UI. IMO that 
is some additional book keeping overhead on the Catalog server and some 
additional lines of code for a feature we don't often use unless we are 
debugging something. Thoughts?


Line 41: 
> remove blank line
Done


http://gerrit.cloudera.org:8080/#/c/5792/1/fe/src/main/java/org/apache/impala/util/GlogAppender.java
File fe/src/main/java/org/apache/impala/util/GlogAppender.java:

Line 155:* Get the log4j log level corresponding to a seriazlied 
TGetLogLevelParams.
> typo: serialized
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 266: UNLIKELY(check_space))
> Well it was unlikely when I over-allocated the space.  Now it is still prob
Yes. I usually leave UNLIKELY() to error cases only. The general consensus 
(from my past experience) is that we use LIKELY/UNLIKELY on critical paths only.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-01-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..


Patch Set 23: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 266: UNLIKELY(check_space))
> Not sure if UNLIKELY() is really unlikely here. Isn't it dependent on the i
Well it was unlikely when I over-allocated the space.  Now it is still probably 
unlikely, since many replaces either will be stripping characters or replacing 
equal size strings.  Doesn't really matter much I suppose.  Should I remove it?


PS5, Line 273: DCHECK(
> DCHECK_EQ().
will fix.  originally was using math on the ptr val and DCHECK_EQ didn't like 
the types involved.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-4729: Implement REPLACE()
..

IMPALA-4729: Implement REPLACE()

This turned out to be slightly non-trivial as REPLACE is already a
keyword, and thus the parser needs to be tweaked to allow this,
since function names act as bare identifiers.

Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/gutil/bits.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
9 files changed, 171 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/5776/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
> This is why I was asking earlier about what is considered a small / large m
Sounds good.


PS5, Line 236: haystack.len + replace.len - pattern.len
> max string size is 1 << 30
Oops. Bad example. I suppose line 244 below will catch the case in which the 
buffer space exceeds StringVal::MAX_LENGTH.


PS5, Line 266: UNLIKELY(check_space))
Not sure if UNLIKELY() is really unlikely here. Isn't it dependent on the 
inputs ?


PS5, Line 272: buffer_space <<= 1;
Please see comments in StringVal::Resize(). I think this can potentially go to 
1 << 31 but it should be fine as long as StringVal::Resize() catches it. May 
want to DCHECK_LE(buffer_space, 1ULL << 30);


PS5, Line 273: DCHECK(
DCHECK_EQ().


http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/udf/udf.cc
File be/src/udf/udf.cc:

PS5, Line 519:  auto* new_ptr = ctx->impl()->ReallocateLocal(ptr, new_len);
This should return StringVal::null() if new_len exceeds StringVal::MAX_LENGTH.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-31 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 13:

John, when do you plan to post a new patch that addresses the last comments? I 
think after that, we should be able to +2 it.

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] List SCHEDULE RANDOM REPLICA in alphabetical order.

2017-01-31 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5835

Change subject: [DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.
..

[DOCS] List SCHEDULE_RANDOM_REPLICA in alphabetical order.

Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
---
M docs/impala.ditamap
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/5835/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1e5ad55b588b668c4b4dfc700f375e39b2453e28
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 127:   // statistics behavior from any implicit behavior of the types?
> If Parquet's and Impala's ordering were "roughly the same", then we would n
My concern is that min()/max() use overloaded operators, e.g. like the ones in 
timestamp-value.h - so any changes or additions get automatically picked up 
here via an implicit mechanism.

It seems like it would make it easy for someone to accidentally break our 
Parquet implementation by changing a seemingly-unrelated bit of code. I think 
the static check above helps a bit but my instinct is to be as explicit as 
possible.

Another way I was thinking about it is that the conceptual distinction between 
Parquet's ordering and ours is important and the code should reflect that in 
some way.

We'll have to see what happens with parquet-format. I agree that it makes 
things tricky if the ordering diverges at all.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
> Did you measure the performance gain for that ? Any hardcoded bound still l
This is why I was asking earlier about what is considered a small / large 
memory size in Impala.  How about we just start with the next power of two 
above something that can fit one replacement?


PS5, Line 236: haystack.len + replace.len - pattern.len
> Wouldn't this be negative if haystack.len == (1 << 31) - 4 and replace.len 
max string size is 1 << 30


PS5, Line 240: buffer_space = out_max;
> an excellent idea
But makes the test 


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml
File docs/topics/impala_incompatible_changes.xml:

Line 71: and query options (such as V_CPU_CORES) 
remain but do not have any effect. 
> OK. I wasn't sure in the race condition between "Reply" and "upload new pat
They are usually done within seconds of each other.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5668/7/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS7, Line 3297: CDH
Please do not add new CDH references.


http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

PS7, Line 724:  
New spaces at end of lines


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml
File docs/topics/impala_incompatible_changes.xml:

Line 71: and query options (such as V_CPU_CORES) 
remain but do not have any effect. 
> It is difficult to review changes when the author replies "Done" before the
OK. I wasn't sure in the race condition between "Reply" and "upload new patch 
set" which should come first. In a different review I got a request not to 
upload the patch set before the responses.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
> I dropped this down to 64k.  Seems like the benefit for not re-allocating i
Did you measure the performance gain for that ? Any hardcoded bound still looks 
a bit arbitrary unless there is justification for choosing it. It'd be nice to 
keep the decision simple while still getting reasonable speedup over 
regex_replace(). Starting with the next power of 2 of the current string length 
will work well with how FreePool allocates memory internally.


PS5, Line 236: haystack.len + replace.len - pattern.len
> should not be able to overflow.  preferred indent is
Wouldn't this be negative if haystack.len == (1 << 31) - 4 and replace.len = 6 
and pattern.len = 2 ?

Correct. Indentation is 4 for line continuation.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-01-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 326:   LOG(INFO) << "Failed to get reservation for " << 
pages_.front().len();
I need to remove this log message.


http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

Line 100:   if (ExecEnv::GetInstance()->buffer_pool() != nullptr) {
Use exec-env directly.


http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 29: #include "runtime/tmp-file-mgr.h" // TODO: can we avoid this?
Remove TODO


Line 121:   Status CheckSpillingEnabled();
This is currently dead code - can remove it.


http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 442:   /// TODO: ownership semantics
Need to fix this TODO


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-31 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#7).

Change subject: Release note updates for Impala 2.8
..

Release note updates for Impala 2.8

First cut at 'new features' topic.

Includes the Incompatible Changes subtopic for Impala 2.8.

Also did some cleanup throughout the Incompatible
Changes page:

- Took out references to Cloudera release numbers
  from titles.
- Suppressed the display of ancient subtopics from
  the Impala beta days, which are intertwined with
  things like what version of Cloudera Manager was
  supported.

Patch set 3:

More on MT_DOP for COMPUTE STATS.
Address comments from Greg and MJ.

Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
---
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_new_features.xml
2 files changed, 457 insertions(+), 177 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5668/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 


[Impala-ASF-CR] [DOCS] Apply conditionalization to tags.

2017-01-31 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5834

Change subject: [DOCS] Apply conditionalization to  tags.
..

[DOCS] Apply conditionalization to  tags.

Housekeeping change. Apply same conditional settings for 
as for corresponding  for some pages that are hidden
or otherwise conditionalized. Doesn't actually have an effect on
the DITA-OT HTML output, but provides some future-proofing if we
did more elaborate linking in the future.

Change-Id: Ia9f0517599b6cab77eaca95b9c7ce7262583dfb9
---
M docs/impala_keydefs.ditamap
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/5834/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia9f0517599b6cab77eaca95b9c7ce7262583dfb9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc
File .pep8rc:

PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign.
   : # E301 - Add missing blank line.
> Yes, that was my intention, too. Such a tool sounds very useful. Is it in a
https://github.com/jbapple-cloudera/clang-format-infer

It uses hill-climbing with random restarts to search the state space.

The config is pretty brittle, I must admit, but I think that's mainly a clang 
problem, so you might be able to repurpose it.


-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 9:

(8 comments)

Thanks for the review. Please see my inline comments and PS11.

http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 339:   int64_t encoded_value_size_;
> sounds good
Done


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 139:   }
> why not just pass in metadata->statistics and then set the __isset flag by 
Done


Line 652:   // Add the size of the data page header
> avoid copy by passing in header.data_page_header.statistics
Done


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the combined size of the min and 
max values of
> qualify as 'parquet.Statistics' so it's clearer
Done. I used :: since that is the class name in its namespace. Do you prefer 
"."?


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 127:   // statistics behavior from any implicit behavior of the types?
> i understand how that may not be the case today, but in order for them to b
If Parquet's and Impala's ordering were "roughly the same", then we would need 
some translation between our min values and the ones in Parquet. For our 
current types, I don't see that as a problem either, but I think Tim was 
concerned about adding types in the future and preventing potential bugs.

I'll let Tim add his thoughts to the discussion, personally I'm good with using 
min/max for now. The comment was there to facilitate this discussion, since it 
came up in reviews of previous patch sets. I will remove it.


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 84: 
> remove
Without these, clang-format will undo all manual changes to the style on lines 
modified by this change. I added it as a TODO to the commit message to remove 
those once the change has a +2, when I will have to rebase it anyways.


Line 87: class ColumnStats : public ColumnStatsBase {
> indent the subsequent lines belonging to the logical expr two more spaces (
Done


Line 157:   /// Returns the number of bytes needed to encode value 'v'.
> this is very verbose. why needed?
See my previous comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5611

to look at the new patch set (#11).

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..

IMPALA-3909: Populate min/max statistics in Parquet writer

TODO after +2 and before merging/rebasing:
- Remove clang-format markers in the code
- Replace NULL with nullptr in modified cc files

Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
A be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-common.h
M be/src/runtime/string-value.h
M be/src/util/dict-test.cc
M tests/query_test/test_insert_parquet.py
M tests/util/get_parquet_metadata.py
8 files changed, 760 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/223/

-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-31 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc
File .pep8rc:

PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign.
   : # E301 - Add missing blank line.
> I can't speak for Lars, but I worked on our initial .clang-format, and I tr
Yes, that was my intention, too. Such a tool sounds very useful. Is it in any 
way re-usable and would you be able to share it?


-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-31 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc
File .pep8rc:

PS1, Line 2: # E101 - Reindent all lines.
> I moderately disagree with this being ignored. Can you talk about your reas
Without E101 and E111, autopep8 will re-indent the whole file to use 4 spaces 
instead of 2, and I couldn't find a way around this.

https://github.com/hhatto/autopep8/blob/master/autopep8.py#L3095


PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign.
   : # E301 - Add missing blank line.
> I strongly disagree with these being ignored. Can you talk about your reaso
I tried to make this match the style that I found, but I don't feel strongly 
about it and would be glad if we could get as close to PEP8 as possible. I'll 
comment more on this on dev@.


http://gerrit.cloudera.org:8080/#/c/5829/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Add .pep8rc for Impala's Python style
> What tools get affected by the presence of this? Can it stay in the top-lev
No tools are automatically affected by this, but several can use it if pointed 
to it with a command line flag. Should we point this out here or would the wiki 
page be sufficient?


-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 211: pattern.is_null
> I believe pattern.is_null should also return StringVal::null(). pattern.is_
Will change this semantic.


PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
> It seems a bit arbitrary to choose 256K. Why not just start with a buffer o
I dropped this down to 64k.  Seems like the benefit for not re-allocating is 
pretty high and we only want to pay costs on very large replaces, which is what 
we'll get.


PS5, Line 236: haystack.len + replace.len - pattern.len
> I believe this can still overflow a 32-bit value, right ? Btw, please keep 
should not be able to overflow.  preferred indent is
   in 4 
   and continuing, correct?


PS5, Line 240: buffer_space = out_max;
> May help to have a test case in which the result string is longer than Stri
an excellent idea


PS5, Line 249:  while (match_pos + needle.len <= haystack.len) {
 :   // Copy in original string
 :   const int bytes = match_pos - consumed;
 :   memcpy(ptr, &haystack.ptr[consumed], bytes);
 :   DCHECK_LE(ptr - result.ptr + bytes, buffer_space);
 :   ptr += bytes;
 : 
 :   // Copy in replacement - always safe since we always leave 
room for one more replace
 :   DCHECK_LE(ptr - result.ptr + replace.len, buffer_space);
 :   memcpy(ptr, replace.ptr, replace.len);
 :   ptr += replace.len;
 : 
 :   // Don't want to re-match within already replaced pattern
 :   match_pos += needle.len;
 :   consumed = match_pos;
 : 
 :   // If we had an enlarging pattern, we may need more space
 :   if (UNLIKELY(check_space)) {
 : const int bytes_produced = ptr - result.ptr;
 : const int bytes_remaining = haystack.len - consumed;
 : if (UNLIKELY(bytes_produced + bytes_remaining + 
replace.len > buffer_space)) {
 :   // Ran out of space, double the size.  See the note 
above regarding the choice
 :   // of a power of two sized buffer.
 :   buffer_space <<= 1;
 :   DCHECK((buffer_space & (buffer_space - 1)) == 0);
 :   const auto ofs = ptr - result.ptr;
 :   if (UNLIKELY(!result.Resize(context, buffer_space))) {
 : return StringVal::null();
 :   }
 :   ptr = result.ptr + ofs;
 : }
 :   }
 : 
 :   StringValue haystack_substring = 
haystack.Substring(match_pos);
 :   int match_pos_in_substring = 
search.Search(&haystack_substring);
 :   if (match_pos_in_substring < 0) break;
 :   match_pos += match_pos_in_substring;
 :   }
> Please ignore my previous comment. We need to keep the memory allocations i
I looked at StringBuffer but it uses MemPool directly.  I'd rather keep this 
change locally confined to using the current allocator rather than adapting a 
new interface.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 249:  while (match_pos + needle.len <= haystack.len) {
 :   // Copy in original string
 :   const int bytes = match_pos - consumed;
 :   memcpy(ptr, &haystack.ptr[consumed], bytes);
 :   DCHECK_LE(ptr - result.ptr + bytes, buffer_space);
 :   ptr += bytes;
 : 
 :   // Copy in replacement - always safe since we always leave 
room for one more replace
 :   DCHECK_LE(ptr - result.ptr + replace.len, buffer_space);
 :   memcpy(ptr, replace.ptr, replace.len);
 :   ptr += replace.len;
 : 
 :   // Don't want to re-match within already replaced pattern
 :   match_pos += needle.len;
 :   consumed = match_pos;
 : 
 :   // If we had an enlarging pattern, we may need more space
 :   if (UNLIKELY(check_space)) {
 : const int bytes_produced = ptr - result.ptr;
 : const int bytes_remaining = haystack.len - consumed;
 : if (UNLIKELY(bytes_produced + bytes_remaining + 
replace.len > buffer_space)) {
 :   // Ran out of space, double the size.  See the note 
above regarding the choice
 :   // of a power of two sized buffer.
 :   buffer_space <<= 1;
 :   DCHECK((buffer_space & (buffer_space - 1)) == 0);
 :   const auto ofs = ptr - result.ptr;
 :   if (UNLIKELY(!result.Resize(context, buffer_space))) {
 : return StringVal::null();
 :   }
 :   ptr = result.ptr + ofs;
 : }
 :   }
 : 
 :   StringValue haystack_substring = 
haystack.Substring(match_pos);
 :   int match_pos_in_substring = 
search.Search(&haystack_substring);
 :   if (match_pos_in_substring < 0) break;
 :   match_pos += match_pos_in_substring;
 :   }
> Is this more performant than simply using string::append() or string += ope
Please ignore my previous comment. We need to keep the memory allocations in a 
freepool.  Ideally, we should use StringBuffer if possible but that interface 
works with MemPool only. May be we can consider adapting its interface to take 
a Resize() function pointer instead so it can be shared by both FreePool and 
MemPool.


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-31 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc
File .pep8rc:

PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign.
   : # E301 - Add missing blank line.
> I strongly disagree with these being ignored. Can you talk about your reaso
I can't speak for Lars, but I worked on our initial .clang-format, and I tried 
to make it match our existing style as closely as possible. I even wrote a 
little tool to explore the state space of config options and find the ones that 
match our codebase with as small a diff as possible. As a result of this 
matching, the options are a bit odd, but it will hopefully encourage new code 
to match the old style as closely as possible, which might be easier to read 
than having new code be in the accepted pep8 style and old code in the 
idiosyncratic Impala style.


-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore services to KRPC

2017-01-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-{4670,4672,4784}: Add RpcMgr and port Statestore 
services to KRPC
..


Patch Set 4:

(3 comments)

looks good, but let's talk about the nscd requirement today.

http://gerrit.cloudera.org:8080/#/c/5720/4/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

Line 166:.ok());
> Not that gets propagated into our Status object - there is room for improve
is there a todo somewhere?


http://gerrit.cloudera.org:8080/#/c/5720/5/be/src/rpc/rpc.h
File be/src/rpc/rpc.h:

Line 49: template 
move them into the class so they're naturally qualified?


http://gerrit.cloudera.org:8080/#/c/5720/4/be/src/statestore/statestore-test.cc
File be/src/statestore/statestore-test.cc:

Line 379:   subscribers_[0]->WaitForUpdates(1);
?


-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-31 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Add .pep8rc for Impala's Python style
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5829/1/.pep8rc
File .pep8rc:

PS1, Line 2: # E101 - Reindent all lines.
I moderately disagree with this being ignored. Can you talk about your 
reasonings for wanting to ignore it?


PS1, Line 5: # E251 - Remove whitespace around parameter '=' sign.
   : # E301 - Add missing blank line.
I strongly disagree with these being ignored. Can you talk about your 
reasonings for wanting to ignore them?


http://gerrit.cloudera.org:8080/#/c/5829/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Add .pep8rc for Impala's Python style
What tools get affected by the presence of this? Can it stay in the top-level 
Impala directory, or do I need to copy it to my homedir?


-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS6, Line 178: ProcessValue
> I tend to favor ProcessValue, since Append (like Encode) leaves out the sta
either way. i don't think consume communicates more than process.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-31 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 136: if (row_group_stats_base_->has_values()
oops, i missed that bug


Line 339: 
> Yes, the dictionary encoders use it as the size of the encoded values. Shou
sounds good


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 139:   row_group_stats_base_->EncodeToThrift(&encoded);
why not just pass in metadata->statistics and then set the __isset flag by hand?


Line 652: page_stats_base_->EncodeToThrift(&encoded);
avoid copy by passing in header.data_page_header.statistics


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the size of a single thrift 
Statistics struct for a page
qualify as 'parquet.Statistics' so it's clearer


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 127: }
> The stats should follow the ordering semantics defined in parquet-format. T
i understand how that may not be the case today, but in order for them to be 
useful, they need to mirror the ordering semantics of the underlying type. 
meaning i still don't see why there should be stats-specific comparisons and 
why we want this todo.


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 84:   // clang-format off
remove


Line 87:   || std::is_same::value
indent the subsequent lines belonging to the logical expr two more spaces 
(style guide)


Line 157: // clang-format off
this is very verbose. why needed?


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-01-31 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#23).

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..

IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE
ADD PARTITION

Just like Hive, Implala should support multiple partitions in ALTER
TABLE ADD PARTITION statements. The syntax is as follows:

ALTER TABLE table_name ADD [IF NOT EXISTS]
PARTITION partition_spec1 [location_spec1] [cache_spec1]
PARTITION partition_spec2 [location_spec2] [cache_spec2]
...

Grammar was modified to handle the new syntax. Introduced PartitionDef
class to capture the repeatable part of the statement. TPartitionDef
is the name of the corresponding thrift class.

AlterTableAddPartitionStmt and CatalogOpExecutor classes were also
modified to work with a list of partitions. Duplicate partition specs
are rejected in AlterTableAddPartitionStmt.analyze().

Added FE, E2E and integration tests.

Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
A fe/src/main/java/org/apache/impala/analysis/PartitionDef.java
M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpecBase.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
M tests/common/impala_test_suite.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_refresh_partition.py
17 files changed, 858 insertions(+), 207 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/4144/23
-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-01-31 Thread Attila Jeges (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4144

to look at the new patch set (#23).

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..

IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE
ADD PARTITION

Just like Hive, Implala should support multiple partitions in ALTER
TABLE ADD PARTITION statements. The syntax is as follows:

ALTER TABLE table_name ADD [IF NOT EXISTS]
PARTITION partition_spec1 [location_spec1] [cache_spec1]
PARTITION partition_spec2 [location_spec2] [cache_spec2]
...

Grammar was modified to handle the new syntax. Introduced PartitionDef
class to capture the repeatable part of the statement. TPartitionDef
is the name of the corresponding thrift class.

AlterTableAddPartitionStmt and CatalogOpExecutor classes were also
modified to work with a list of partitions. Duplicate partition specs
are rejected in AlterTableAddPartitionStmt.analyze().

Added FE, E2E and integration tests.

Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
A fe/src/main/java/org/apache/impala/analysis/PartitionDef.java
M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpecBase.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
M tests/common/impala_test_suite.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_refresh_partition.py
17 files changed, 858 insertions(+), 207 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/4144/23
-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-01-31 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..


Patch Set 22:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4144/22/fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java:

Line 92:   throw new AnalysisException(
> Add a test for this in AnalyzeDDLTest
Done


http://gerrit.cloudera.org:8080/#/c/4144/22/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS22, Line 221: Used when persisting the results of COMPUTE STATS statements.
> Update the comment (now it is used for the add partition statement)
Done


Line 1859:* 2. If a partition exists in HMS but not in catalog cache, 
reload partition from HMS.
> mention that caching directives are only applied to new partitions that wer
Done


Line 1861:   private Table alterTableAddPartitions(Table tbl,
> Since there are some interesting details with existing partitions (in cache
Filed IMPALA-4844


http://gerrit.cloudera.org:8080/#/c/4144/22/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 1023: # IMPALA-1670: Cannot add duplicate partition specs.
> remove, this is covered by an analyzer test
Done


Line 1033: # IMPALA-1670: Cannot add duplicate partition specs, not even if 'IF 
NOT EXISTS' is used.
> remove, this is covered by an analyzer test
Done


Line 1056: # IMPALA-1670: If 'IF NOT EXISTS' is not used, ALTER TABLE ADD 
PARTITION cannot add a
> This should be covered by an analyzer test instead
Done


Line 1066: # IMPALA-1670: If adding one partition fails, the entire statement 
fails.
> Not really accurate. There are also modes of partial failure/success. If an
Removed this section. The test case between L1056-1063 was moved to 
AnalyzeDDLTest.


Line 1150: # IMPALA-1670: After REFRESH Impala can access previously added 
partitions and partition
> Seems a little excessive to me, not sure if we need this test.
I agree, we don't need to test the same scenario both with INVALIDATE METADATA 
and REFRESH. Removed this section.


http://gerrit.cloudera.org:8080/#/c/4144/22/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

Line 242: # IMPALA-1670: Revoke privileges, so that user would not have 
privileges to run ALTER
> I agree. Not all of them are needed. I'd like to keep the ones where we gra
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4617: remove IsConstant() analysis from be
..


IMPALA-4617: remove IsConstant() analysis from be

This change avoids the need to duplicate the logic in Expr.getConstant()
in the frontend and Expr::GetConstant() in the backend. Instead it is
plumbed through from the frontend.

To make it easier to reason about the state of isAnalyzed_ and
isConstant_, made isAnalyzed_ private and refactored
analyze() so that isAnalyzed_ is only set at the end of analysis, not in
the middle of it.

I considered going further and only allowing isConstant() to be called
only on analyzed expressions, but there are multiple places in the code
that depend on this working in non-trivial ways, so that would be a lot
more invasive.

There should be no behavioural changes as a result of this patch, aside
from a minor fix for "limit count(*)" where an internal error message
was produced instead of the expected one about constant expressions.

Testing:
Ran exhaustive tests. Added a regression test for "limit count(*)".

Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90
Reviewed-on: http://gerrit.cloudera.org:8080/5415
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exprs/expr.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/tuple-is-null-predicate.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
M fe/src/main/java/org/apache/impala/analysis/ExtractFromExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/LimitElement.java
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
38 files changed, 202 insertions(+), 218 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be

2017-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4617: remove IsConstant() analysis from be
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4729: Implement REPLACE()

2017-01-31 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4729: Implement REPLACE()
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5776/5/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS5, Line 211: pattern.is_null
I believe pattern.is_null should also return StringVal::null(). pattern.is_null 
== true means pattern is "undefined".


PS5, Line 231: int64_t out_max = match_pos + replace.len * ((haystack.len - 
match_pos) / pattern.len);
This may potentially result in a huge overestimation if there are few matches 
and the replacement string is a lot longer than pattern string.


PS5, Line 232: if (UNLIKELY(out_max > 1ULL << 18)) {
It seems a bit arbitrary to choose 256K. Why not just start with a buffer of 
about the same length as the original string and keep appending to it as we 
find matches ?


PS5, Line 236: haystack.len + replace.len - pattern.len
I believe this can still overflow a 32-bit value, right ? Btw, please keep the 
line to be within 90 characters.


PS5, Line 240: buffer_space = out_max;
May help to have a test case in which the result string is longer than 
StringVal::MAX_LENGTH and verifies that we actually get StringVal::null() back.


PS5, Line 249:  while (match_pos + needle.len <= haystack.len) {
 :   // Copy in original string
 :   const int bytes = match_pos - consumed;
 :   memcpy(ptr, &haystack.ptr[consumed], bytes);
 :   DCHECK_LE(ptr - result.ptr + bytes, buffer_space);
 :   ptr += bytes;
 : 
 :   // Copy in replacement - always safe since we always leave 
room for one more replace
 :   DCHECK_LE(ptr - result.ptr + replace.len, buffer_space);
 :   memcpy(ptr, replace.ptr, replace.len);
 :   ptr += replace.len;
 : 
 :   // Don't want to re-match within already replaced pattern
 :   match_pos += needle.len;
 :   consumed = match_pos;
 : 
 :   // If we had an enlarging pattern, we may need more space
 :   if (UNLIKELY(check_space)) {
 : const int bytes_produced = ptr - result.ptr;
 : const int bytes_remaining = haystack.len - consumed;
 : if (UNLIKELY(bytes_produced + bytes_remaining + 
replace.len > buffer_space)) {
 :   // Ran out of space, double the size.  See the note 
above regarding the choice
 :   // of a power of two sized buffer.
 :   buffer_space <<= 1;
 :   DCHECK((buffer_space & (buffer_space - 1)) == 0);
 :   const auto ofs = ptr - result.ptr;
 :   if (UNLIKELY(!result.Resize(context, buffer_space))) {
 : return StringVal::null();
 :   }
 :   ptr = result.ptr + ofs;
 : }
 :   }
 : 
 :   StringValue haystack_substring = 
haystack.Substring(match_pos);
 :   int match_pos_in_substring = 
search.Search(&haystack_substring);
 :   if (match_pos_in_substring < 0) break;
 :   match_pos += match_pos_in_substring;
 :   }
Is this more performant than simply using string::append() or string += 
operator ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1780a7d8fee6d0db9dad148217fb6eb10f773329
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes