[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool

2017-04-05 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool
..


Patch Set 16:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

PS16, Line 58: The caller can claim up to 'max_bytes_to_claim' to allocate a 
buffer from the
 :   /// system
hmm is that true? what if the number of bytes freed is less than that? oh, 
reading on, I see that the actual number of bytes the caller can claim is 
returned as well.

So, how about specifying as the input is 
target_bytes_to_free/target_bytes_to_claim and the output is the actual bytes 
freed and bytes claimed? And rewording the comment in those terms?


Line 451:   DCHECK_GT(target_bytes_to_free, 0);
DCHECK_GE(target_bytes_to_free, max_bytes_to_claim)?


Line 512:   int64_t bytes_claimed = bytes_freed;
why is that right? shouldn't it be:

bytes_claimed = min(bytes_freed, max_bytes_to_claim);
if (bytes_freed > bytes_claimed) ... add to system_bytes_remaining_...


Line 544: lists->num_free_buffers.Add(1);
it would be good to move AddFreeBuffer() and RemoveCleanPage() to be adjacent, 
so it's more obvious that their code has similarities (in case they need 
updating, etc).


PS16, Line 555: two Maintenance(
does this need updating?


http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-allocator.h
File be/src/runtime/bufferpool/buffer-allocator.h:

PS16, Line 59: stored inside the buffer allocator.
"stored" is kinda confusing and ambiguous. maybe say similar to #2: in the 
BufferAllocator's clean page lists.


PS16, Line 59: clean pages
clean unpinned pages
just to clarify


PS16, Line 65: concurrently
delete


PS16, Line 66: allocate
this is from the BufferAllocator, not the SystemAllocator, right?


PS16, Line 66: buffer
... buffer from the BufferAllocator's free or clean page lists ...
(i.e. we're not talking about releasing a 2MB buffer from the client to fit in 
the reservation, right?)


PS16, Line 164: .
I think we should also explicitly say:
If 'slow_but_sure' is false, then this functional optimistically tries to 
reclaim the memory but may not reclaim all 'target_bytes' of memory.


PS16, Line 164: is slower and
nit: redundant with "slower strategy" stated earlier.


http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

PS16, Line 374: write
how about: "write handle", since the write I/O is done already. or really, I 
think you can delete this first clause.


http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS15, Line 315: ion(
> I agree it would probably be better to design the AddGcFunction() interface
What I'm suggesting is to just get rid of AddGcFunction() and just have the 
mem-tracker.cc code call the functions in order.  Or maybe you're saying it's 
not so simple since not all the objects of the called methods are singletons?

In any case, if it is a bunch of work, I'm fine with just clarifying the order. 
This code should hopefully go away / simplify as we move more things under 
buffer-pool, right?


http://gerrit.cloudera.org:8080/#/c/6414/16/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

PS16, Line 300: if none of the other
  :   /// previously-added GC
don't we need to still call the tc-malloc GC code?  Otherwise, we won't make 
progress against lowering the process consumption and so still can't make new 
allocations, no?


-- 
To view, visit http://gerrit.cloudera.org:8080/6414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I612bd1cd0f0e87f7d8186e5bedd53a22f2d80832
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 749:   // conjuncts because NULLs are handled differently for 
CompoundPredicate.AND
This is the comment to which I was referring.


http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> Agree we can address the issue if we want to. I suggest we leave for now be
I am terrified at how invasive that diff could become.  Totally agree it should 
be in another change.


Line 335: 00:SCAN HDFS [functional.alltypes]
> I think this is a minor oversight in optimizeConjuncts(). If one of the con
Easy fix, will check to make sure this is consistent with existing behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-04-05 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 6:

Re: gflags - agree we need to look hard at a solution. Do you think we could do 
that in a follow-on patch, or would you prefer it's done here before we put 
kutil in?

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-04-05 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..


Patch Set 3: Code-Review+1

All good on my side.

-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


IMPALA-5154: Handle 'unpartitioned' Kudu tables

The catalogd was hanging trying to load an unpartitioned
Kudu table created outside of Impala. This fixes an
assumption made in KuduTable.java that the list of
'partition by' expressions is not empty. Regardless, the
list on the thrift structure must be created because the
field is marked required.

Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Reviewed-on: http://gerrit.cloudera.org:8080/6560
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M tests/query_test/test_kudu.py
2 files changed, 35 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args

2017-04-05 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6442/2/be/src/util/blocking-queue.h
File be/src/util/blocking-queue.h:

Line 110:   /// false.
can you concisely explain what V is?


Line 133:   /// put the element, returns false.
same here


-- 
To view, visit http://gerrit.cloudera.org:8080/6442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> It is fixable.  I think this is because the resulting Expr ends up with no 
Agree we can address the issue if we want to. I suggest we leave for now 
because the fix is possibly invasive and the gain is minimal. It's also 
unrelated to your change.


Line 335: 00:SCAN HDFS [functional.alltypes]
> bool_col = null is stripped from conjunct evaluation.  bool_col IS null is 
I think this is a minor oversight in optimizeConjuncts(). If one of the 
conjuncts becomes NULL, then it has the same as a FALSE literal, but in that 
case optimizeConjuncts() will return true.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.

2017-04-05 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Impala and Hive have slightly different SQL they accept.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6567/1/index.html
File index.html:

PS1, Line 154:  the same
> maybe this should be 'compatible'. Technically we have some different metad
I don't really understand the state of ODBC in Impala.

Do you have a wording in mind that would reflect reality? Would something like 
"Impala utilizes Hive-compatible metadata and a Hive-compatible ODBC driver" 
work?


-- 
To view, visit http://gerrit.cloudera.org:8080/6567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6261/12/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 2265: AnalyzesOk("create external table t (x int primary key) 
partition by hash (x) " +
What does this statement do after your changes? Will it create a new Kudu table 
since column definitions are given?


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Impala and Hive have slightly different SQL they accept.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6567/1/index.html
File index.html:

PS1, Line 154:  the same
maybe this should be 'compatible'. Technically we have some different metadata, 
and the ODBC driver is similar but not exactly the same (we just support the 
HS2 interface, but that's kind of an implementation detail)


-- 
To view, visit http://gerrit.cloudera.org:8080/6567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.

2017-04-05 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6567

Change subject: Impala and Hive have slightly different SQL they accept.
..

Impala and Hive have slightly different SQL they accept.

https://lists.apache.org/thread.html/ad17fb6e145ecb082c529f2f21b4661ad1047021dd579117635ce501@%3Cdev.impala.apache.org%3E

Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
---
M index.html
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/6567/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 5:

> (5 comments)
 > 
 > > 1) Brining in all the gflags is maybe concerning. will any
 > conflict with ours? even if not, they will crowd our already large
 > number of gflags. perhaps we need to look more closely at them.
 > 
 > I agree, it's a lot of new (unused) gflags. I filed
 > https://issues.apache.org/jira/browse/IMPALA-5174 to see if
 > 
 > > 2) I'm curious to know how this (and more particularly the actual
 > code import change) affects the binary size.
 > 
 > Before: 391865544
 > After: 391795120
 > 
 > So somehow it's smaller (?). I'll take a closer look, but nothing
 > to worry about right now.
 > 
 > > 3) It'd be good if we could push some of these changes upstream
 > if possible.
 > 
 > Will do. The issue is that every upstream update brings in all the
 > changes since the last rebase, which leads to more compilation or
 > even functionality issues to resolve. So rather than try and make
 > that process converge, I figured I'd draw a line under a known good
 > version of kutil, and add the few changes needed to Impala.
 > 
 > I should upstream these changes soon, though, so that next upgrade
 > we bring them all in.

Thanks for looking into these. Weird that the binary got smaller.

I do think the gflags thing is a problem we need to address though, there are 
86 flags defined in util/ alone. Maybe we can augment the gflag macros in the 
directories we import to prepend "_kudu_import" or such - that wouldn't get rid 
of them but at least group them and keep them from making Impala's flags hard 
to find.

Perhaps we should run this by some other folks to see if (a) others maybe 
aren't concerned about this and (b) anyone has some ideas for dealing with this 
gracefully.

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 159:  select count(*) from
> There's still a question as to whether we should impose a limit on the size
I'll probably just re-use the same rule.


Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> This is the current intended behavior. See the bottom of hdfs.test planner 
It is fixable.  I think this is because the resulting Expr ends up with no 
SlotRefs, and as a result conjunct.isBoundBySlotIds(partitionSlots_) returns 
true, which is questionable.  Changing that would have deep consequences, but 
we can always build something to collect all the SlotRefs from an Expr and 
validate they are a non-empty subset of partitionSlots_


Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = 
o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey
> long line
Done


Line 335: 00:SCAN HDFS [functional.alltypes]
> Why not an EmptySetNode?
bool_col = null is stripped from conjunct evaluation.  bool_col IS null is not. 
 I am guessing because 10 = null => null and 10 IS NULL => false.

Is this behavior incorrect?  It certainly is suspect, but there are some 
comments about how conjunct evaluation treats null conditions specially and I 
wasn't sure which was the correct behavior here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 16:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 418:   if 
(analyzer.getQueryCtx().client_request.getQuery_options().enable_expr_rewrites) 
{
use analyzer.getQueryOptions()


http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1274: if 
(analyzer.getQueryCtx().client_request.query_options.enable_expr_rewrites) {
use analyzer.getQueryOptions()


http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 159:  select count(*) from
There's still a question as to whether we should impose a limit on the size of 
the conjunct list we'll optimize (due to N^2 cost), similar to what we do in 
ExtractCommonConjunctsRule.

I'd suggest running perf on some extreme examples to see what a reasonable 
cutoff might be. Or you can also adopt the ExtractCommonConjunctsRule threshold.


Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> In testing this, I discovered the following:
This is the current intended behavior. See the bottom of hdfs.test planner 
test. The behavior is questionable, but it certainly has nothing to do with 
your change.


Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = 
o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey
long line


Line 335: 00:SCAN HDFS [functional.alltypes]
Why not an EmptySetNode?


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-04-05 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#6).

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..

IMPALA-4669: [KUTIL] Add kudu_util library to the build.

A few miscellaneous changes to allow kudu_util to compile with Impala.

Add kudu_version.cc to substitute for the version.cc file that is
automatically built during the full Kudu build.

Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to
use deprecated names for LZ4 methods.

Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to
disable building with nvm support, removing a library dependency.

Also remove imported FindOpenSSL.cmake in favour of the standard one provided
by cmake itself.

Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/CMakeLists.txt
A be/src/common/kudu_version.cc
M be/src/common/logging.cc
A be/src/kudu/gutil
M be/src/kudu/util/CMakeLists.txt
M be/src/kudu/util/cache.cc
M be/src/kudu/util/compression/compression_codec.cc
M be/src/kudu/util/env.cc
M be/src/kudu/util/flags.cc
A be/src/kudu/util/kudu_export.h
M be/src/kudu/util/logging.cc
M be/src/kudu/util/minidump.cc
M be/src/kudu/util/status.h
D cmake_modules/FindOpenSSL.cmake
16 files changed, 134 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-04-05 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 5:

(5 comments)

> 1) Brining in all the gflags is maybe concerning. will any conflict with 
> ours? even if not, they will crowd our already large number of gflags. 
> perhaps we need to look more closely at them.

I agree, it's a lot of new (unused) gflags. I filed 
https://issues.apache.org/jira/browse/IMPALA-5174 to see if

> 2) I'm curious to know how this (and more particularly the actual code import 
> change) affects the binary size.

Before: 391865544
After: 391795120

So somehow it's smaller (?). I'll take a closer look, but nothing to worry 
about right now.

> 3) It'd be good if we could push some of these changes upstream if possible.

Will do. The issue is that every upstream update brings in all the changes 
since the last rebase, which leads to more compilation or even functionality 
issues to resolve. So rather than try and make that process converge, I figured 
I'd draw a line under a known good version of kutil, and add the few changes 
needed to Impala.

I should upstream these changes soon, though, so that next upgrade we bring 
them all in.

http://gerrit.cloudera.org:8080/#/c/5715/5/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 315:   maintenance_manager_proto
> It might be nice to keep kudu specific libs in a separate list and merge th
Turns out this isn't necessary any more.


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/CMakeLists.txt
File be/src/kudu/util/CMakeLists.txt:

Line 208: if(NOT NO_NVM_SUPPORT)
> I think the NO_NVM_SUPPORT flag would probably be useful to add to Kudu, an
I agree - I have tried upstreaming it but the reviewers (understandably) wanted 
a slightly more general solution that I didn't have time to do. 

I would expect this to show up in future upgrades of the kutil library.


Line 331: target_link_libraries(protoc-gen-insertions gutil glog gflags 
protobuf protoc ${KUDU_BASE_LIBS})
> can this be added upstream?
Will look into it.


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/env.cc
File be/src/kudu/util/env.cc:

Line 10: #include "kudu/util/status.h"
> seems like they're missing this include, can this be added upstream?
Yes, I think so.


http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/logging.cc
File be/src/kudu/util/logging.cc:

Line 47: DECLARE_string(log_filename);
> what does this mean for our log files? will this code start writing to our 
Not AFAICT. We don't call kudu's log initialization code.


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
In testing this, I discovered the following:

Query: explain select count(*) from
  (select * from functional.alltypes where int_col = 10) T
where (coalesce(NULL, T.int_col) + random() * T.int_col = 100 OR coalesce(NULL, 
T.int_col) + T.int_col = 20)
ERROR: NotImplementedException: Unsupported non-deterministic predicate: TRUE 
OR 10 + random() * 10 = 100

This looks like a bug, but is it a regression?


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 87: if (!s.ok()) {
> Actually I think we may want to have a parameter indicating if strings shou
I checked with Alexey on the Kudu team, this would only fail if we: a) wrote 
null to a non nullable col, b) tried to set a value of the wrong type, e.g. 
calling SetString*() on a non string column

I think we can avoid (a) as mentioned above, and (b) would be a bug in our code.

Therefore I think we can DCHECK


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-5003: Constant propagation in scan nodes
..

IMPALA-5003: Constant propagation in scan nodes

When conjuncts are pushed into table refs from inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs, and in other
cases, we might be able to prune partitions based on expressions which
have now become constant.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 602 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> I commented on the decision to keep the sort.by.columns property visible in
i think a 'show create table t' should print the sort-by clause, not a table 
property.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

2017-04-05 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
..


Patch Set 15:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 923:* Propagates constant expressions of the form  = 
 to
> Explain candidate param
Done


Line 924:* other uses of slot ref in all expressions in conjuncts; returns 
a BitSet with
> in all expressions in conjuncts -> in the given conjnucts
Done


Line 942: // Don't rewrite with our own substitution!  We may have 
already processed
> Explanation seems more complicated than necessary. Clearly it would be wron
Done


Line 960:* Returns true if the conjuncts may be true and false if a 
contradiction has
> Shorter: Returns false if a contradiction has been implied, true otherwise.
Done


Line 980:   // applied by the rewriter.  We must also re-analyze result 
exprs to
> I think we can remove everything after "We must also re-analyze  ..." to ke
Done


Line 985:   if (index < 0) continue;
> Agree. Makes sense.
Done


Line 988:   // Reset Expr to force updating cost
> Re-analyze expr to add implicit casts, resolve function signatures, and com
I made this slightly simpler due to required indentation level


Line 998:   // because they can't be evaluated if expr rewriting is 
turned off.
> You should be able to do this with a PlannerTest. For example, look at Plan
Done


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1266: 
> Looks like the constant propagation is applied even when expr rewrites are 
Done


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 101: # XXX - this seems correct, but is this a legal transformation?
> I agree that implicit casts on both sides seem ok.
Done


Line 161: # We can optimize in some cases
> Instead of testing various predicate assignment scenarios, we should focus 
Done.  I haven't tried an extreme example, but I added a few crazy ones to the 
test.


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 127:  SCANRANGELOCATIONS
> Don't think we need the SCANRANGELOCATIONS and DISTRIBUTEDPLAN sections.
nope


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 87: if (!s.ok()) {
> I'm not sure if this will return an error we need to handle at runtime, i.e
Actually I think we may want to have a parameter indicating if strings should 
be copied or not, because in this case I don't think we need to do a string 
copy. Then you can use SetStringNoCopy() instead of SetString(). I'm asking 
someone on the Kudu team if SetStringNoCopy() has any errors we need to handle.


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4086: Add benchmark for simple scheduler

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4086: Add benchmark for simple scheduler
..


Patch Set 5:

(3 comments)

This is still just benchmarking ComputeScanRangeAssignment(), right, not the 
whole Schedule() method. That seems reasonable to me, but it seems like there 
was some back-and-forth earlier.

http://gerrit.cloudera.org:8080/#/c/4554/5/be/src/benchmarks/scheduler-benchmark.cc
File be/src/benchmarks/scheduler-benchmark.cc:

Line 18: #include 
Nit: I personally think we should treat gutil as being part of our source tree, 
so it should go below and be quoted with ""


PS5, Line 46: Blocks and block
This doesn't seem right.


http://gerrit.cloudera.org:8080/#/c/4554/5/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

Line 346:   Status ComputeScanRangeAssignment(QuerySchedule* schedule);
What do you think about adding a comment here, or maybe in the .cc pointing to 
the benchmark, so that it's more discoverable for new people working on the 
code.


-- 
To view, visit http://gerrit.cloudera.org:8080/4554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I89ec1c6c1828bb0b86d1e13ce4dfc5a8df865c2e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4817: Populate Parquet Statistics for Strings

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6563

Change subject: IMPALA-4817: Populate Parquet Statistics for Strings
..

IMPALA-4817: Populate Parquet Statistics for Strings

This change adds functionality to populate the new parquet::Statistics
fields 'min_value' and 'max_value', that were added in parquet-format PR
change, Impala will stop populating the deprecated 'min' and 'max'
fields.

Keeping track of StringValue statistics requires some memory management
code to materialize values that reside in memory owned by row batches.

The HdfsParquetScanner will preferably read the new fields if they are
populated. For tables with only the old fields populated, it will read
them only if they are of simple numeric type, i.e. boolean, integer, or
floating point.

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields.

TODO: This change still needs tests reading statistics from files which
use the old 'min' and 'max' fields, such as those written by Hive. I'll
add these tests in a subsequent patch set.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M tests/query_test/test_insert_parquet.py
9 files changed, 411 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 6:

Is this superseded by the other review?

-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5171: update RAT excluded files list
..


IMPALA-5171: update RAT excluded files list

This commit

IMPALA-5140: improve docs building guidelines

renamed a file, but the RAT exclusion list wasn't update. This patch
fixes the exclusion.

Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
Reviewed-on: http://gerrit.cloudera.org:8080/6561
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M bin/rat_exclude_files.txt
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5171: update RAT excluded files list
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5171: update RAT excluded files list
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/108/

-- 
To view, visit http://gerrit.cloudera.org:8080/6561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 10:

> > Patch Set 9: Verified+1
 > 
 > The docs build job will prevent RAT-breaking changes from getting
 > submitted, and it only takes about 10 minutes.
 > 
 > This change breaks RAT: 
 > http://jenkins.impala.io:8080/job/rat-check/769/console

Sorry and thanks for letting me know. https://gerrit.cloudera.org/#/c/6561/

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5171: update RAT excluded files list

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6561

Change subject: IMPALA-5171: update RAT excluded files list
..

IMPALA-5171: update RAT excluded files list

This commit

IMPALA-5140: improve docs building guidelines

renamed a file, but the RAT exclusion list wasn't update. This patch
fixes the exclusion.

Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
---
M bin/rat_exclude_files.txt
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6561/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I07c64ba2fda7a996f24a49417e3e5485872589f7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-04-05 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
> Nice
Leave a todo to put this in a KRPC sidecar :) No need to pay serialization for 
the string itself!


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..


Patch Set 2:

(6 comments)

Quick pass with some minor comments. I am still trying to wrap my head around 
the concept of KuduPartitionExpr...

http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

PS1, Line 62: int col, PrimitiveType type
Comment on these params.


http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 1845: f (fragment.__isset.output_sink && 
fragment.output_sink.__isset.stream_sink
  :   && fragment.output_sink.type == 
TDataSinkType::DATA_STREAM_SINK
  :   && fragment.output_sink.stream_sink.output_partition.type 
== TPartitionType::KUDU)
Add a brief comment.


http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

PS1, Line 134: 4
2?


PS1, Line 137: 5
3? :)


http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their 
column's position in the
 :   // table, eg. partitionKeyExprs_[i] = 
table_.columns(partitionKeyIdx_[i])
 :   private List partitionKeyIdxs_ = Lists.newArrayList();
Just for my understanding, can you explain why this is needed for this change?


http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS1, Line 48: // For kudu partition: expr that calls into the kudu client to 
get the partition number.
:   private Expr partitionExpr_;
: 
:   // Should be called only by the static factory method for Kudu 
partitioned tables.
:   private DataPartition(TPartitionType type, Expr partitionExpr) {
: Preconditions.checkNotNull(partitionExpr);
: Preconditions.checkState(type == TPartitionType.KUDU);
: type_ = type;
: partitionExpr_ = partitionExpr;
: partitionExprs_ = Lists.newArrayList();
:   }
I am not sure I get this. Why not passing a single element list of the case of 
Kudu partitioned?


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 10:

> Patch Set 9: Verified+1

The docs build job will prevent RAT-breaking changes from getting submitted, 
and it only takes about 10 minutes.

This change breaks RAT: http://jenkins.impala.io:8080/job/rat-check/769/console

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..

IMPALA-3742: Partitions and sort INSERTs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow
and leading to timeouts.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions and sorts
rows according to Kudu's partitioning scheme for INSERTs
and UPSERTs. A followup patch will handle UPDATE and DELETE.

It accomplishes this by inserting an exchange node and a sort
node into the plan before the operation. Both the exchange and
the sort are given a KuduPartitionExpr which takes a row and
calls into the Kudu client to return its partition number.

Testing:
- Updated planner tests.
- Ran the Kudu functional tests.
- Ran performance tests demonstrating that we can now handle much
  larger inserts without having timeouts.

Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-context.h
M be/src/exprs/expr.cc
A be/src/exprs/kudu-partition-expr.cc
A be/src/exprs/kudu-partition-expr.h
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Exprs.thrift
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
24 files changed, 593 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/440/

-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 10:

(8 comments)

Thank you for your reviews. Please see my inline comments and PS10.

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS9, Line 376: analyzeSortByColumn
> Instead of checking this here, it's a bit cleaner to alway call analyzeSort
Done


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS9, Line 49: roperties.containsKe
> The caller is not supposed to modify the return values, correct? You may ju
Done, I ended up using ImmutableList, since I already use it below.


PS9, Line 77: 
> Same comment as above.
Done


PS9, Line 89: t by column in the li
> You can still construct and return an ImmutableList. See Guava's ImmutableL
I needed the contains() method in line 105, as well as size(), so I ended up 
using a LinkedHashSet.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
> why allow this in addition to the '... sort by ()' clause?
Alex and I discussed this and Alex suggested that we allow this, since it is 
easier than explicitly disabling it, and we'll always have the possibility that 
someone changes the values in Hive. The alternative would be to try and hide 
the table property completely. If Alex doesn't object and you think it's worth 
the effort to hide this from the users, I can implement it.


Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition 
must not contain " +
> same here, why allow the sort.by.columns table property?
Please see my reply above.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS9, Line 2374: // SORT BY must be the first table option
  : ParserError("CREATE 
> You may want to comment why this results in a parsing error.
Done


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> why not print that as a 'sort by' clause?
I commented on the decision to keep the sort.by.columns property visible in 
AnalyzeDDLTest.java. I discussed ToSqlTest() with Alex, too, and IIRC he 
explained that this is only really relevant for views, in which create table 
statements cannot occur anyways. Therefore, this should never be visible to a 
user and is tested here only for completeness. Should I add a comment to 
explain this?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#10).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,121 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4893: Efficiently update the rows read counter for 
sequence file
..


Patch Set 3:

I ran an exhaustive build and it looks like the test failed on other file 
formats. I think the problems I saw were:

* If the data is split between multiple files we get values < 10k (i.e. we 
probably need to have a separate test where we set num_nodes=1).
* Some scanners, e.g. Kudu, don't have this counter

-- 
To view, visit http://gerrit.cloudera.org:8080/6522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..

IMPALA-5154: Handle 'unpartitioned' Kudu tables

The catalogd was hanging trying to load an unpartitioned
Kudu table created outside of Impala. This fixes an
assumption made in KuduTable.java that the list of
'partition by' expressions is not empty. Regardless, the
list on the thrift structure must be created because the
field is marked required.

Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M tests/query_test/test_kudu.py
2 files changed, 35 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/6560/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4883: Union Codegen

2017-04-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
..


Patch Set 6: Code-Review+1

(4 comments)

Looks good. I'll let Michael finish his review - he can probably do the +2.

http://gerrit.cloudera.org:8080/#/c/6459/6/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 98: 
Extra blank lines.


Line 235: if (ReachedLimit()) {
I don't think this can be taken now, right? Since we don't increment 
num_rows_returned_ until later.


http://gerrit.cloudera.org:8080/#/c/6459/6/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 116:   /// call on the child.
Maybe mention that these GetNext* functions don't apply the limit?


PS6, Line 127:  Increments 'num_rows_returned_' and
 :   /// 'child_row_idx_'
I think this needs an update.


-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 483:it can be created directly in Kudu and then loaded as an 
external table.
> if this is harmless/useful functionality, should impala also support it?
I had talked to DanB about it previously, and he said they want people to move 
away from creating tables like this but that they didn't want to break 
backwards compatibility in the client.


Line 500: cursor.execute("SELECT COUNT(*) FROM %s" % name)
> Just curious: Can we insert into the table? Is this table expected to suppo
It should behave normally after it's created, we don't know anything about the 
partitioning. I'll modify this test to insert as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-04-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#2).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Open the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default
2. Share the file handle between scan ranges from the same
file

The scan range no longer maintains its own Hdfs file
handle. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

Scan ranges that are accessing data cached by Hdfs
will get a file handle from the cache, but the file
handle will be kept on the scan range for the time
that the scan range is in use. This prevents the
cache from closing the file handle while the data
buffer is in use.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition.

If max_cached_file_handles is set to 0, file handle
caching is off and the previous behavior applies.

TODO:
1. Determine appropriate defaults.
2. Write tests
  a. Delete test
  b. Block rebalance test?
3. For scan ranages that use Hdfs caching, should there
be some sharing at the scanner level?

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/lru-cache-test.cc
M be/src/util/lru-cache.h
M be/src/util/lru-cache.inline.h
11 files changed, 387 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Thread level file handle caching

2017-04-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4623: Thread level file handle caching
..


Patch Set 1:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 70: DEFINE_uint64(max_cached_file_handles, 1, "Maximum number of HDFS 
file handles "
> on further consideration, esp. in light of what tim brought up (separate re
Comment out of date.


Line 388:   std::vector num_threads_per_disk;
> don't use std:: in .cc files
Done


Line 1050: void DiskIoMgr::WorkLoop(DiskQueue* disk_queue, int 
thread_file_handle_cache_size) {
> you use 'fh' elsewhere for 'file handle', feel free to use that elsewhere i
removed


Line 1052:   if (thread_file_handle_cache_size > 0) {
> dcheck that it's not < 0
removed


Line 1081:   Write(worker_context, static_cast(range));
> what about writing?
We currently don't do writes to HDFS via the IO manager. The writes here are to 
local files.

When we are writing tables, we open an Hdfs file handle in hdfs-table-sink.cc. 
I don't think these file handles can be reused, because we open it in write 
only mode. The file handle cache opens in read only mode. I don't see any way 
to convert them.


Line 1337:   : capacity_(capacity) {}
> move to .h
removed


Line 1339: DiskIoMgr::ThreadFileHandleCache::~ThreadFileHandleCache() {}
> remove
removed


Line 1343:   std::string file_key = std::string(fname) + std::to_string(mtime);
> is this guaranteed to be unique w/o some sort of separator?
removed


Line 1351:   EvictFileHandle();
> you only call this once, and it's a small function, you might as well inlin
removed


http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 233:   /// This is a single-threaded LRU cache for Hdfs file handles. The 
cache creates and
> I guess I was thinking that each thread would temporarily check out a file 
The approach changed to using a single cache at the DiskIoMgr level like the 
existing cache.


Line 235:   class ThreadFileHandleCache {
> 'thread' is generic. LruFileHandleCache? or simply FileHandleCache?
Removed.


Line 240: /// Gets an Hdfs file handle for the specified file. It will 
lookup and use a
> "look up"
Removed


Line 245: HdfsCachedFileHandle* GetFileHandle(const hdfsFS& fs, const char* 
fname, int64_t mtime);
> long line
Removed


Line 248: 
> remove blank line
Removed


Line 256: class LruListElement {
> make it a struct, class member vars are supposed to be private.
Removed


Line 260:   ~LruListElement() {}
> remove
Removed


Line 533: /// in the reader context.
> "reader context" = reader_ member. referencing the member directly is more 
N/A in new code


Line 544: /// the DiskIoRequestContext. This clears the statistics on this 
file handle.
> why not make that a function of diskiorequestcontext
The two statistics that need input from the ScanRange are 
unexpected_remote_bytes_ and num_remote_ranges_.

The unexpected_remote_bytes_ requires knowing the value of expected_local_ for 
the ScanRange. There is some tracing that prints when a ScanRange has 
unexpected remote bytes (see ScanRange::Close for where it is now). The tracing 
is the only thing that would prevent this from being put directly into 
DiskIoMgrContext if we passed in expected_local_ or the ScanRange to the 
function.

num_remote_ranges_ is impacted by the fact that we can call GetStatistics 
multiple times per scan range. It needs to know if this range has already been 
counted at the DiskIoMgrContext level. This can be worked around.


Line 574: /// at read time, so hdfs_file_ is null.
> is there a need to have it both ways for hdfs files, instead of always open
We need to keep the hdfs file open as long as the cached buffer is around so 
that the hdfs file doesn't get closed. We also use this when file handle 
caching is off (max_cached_file_handles=0).


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 500: cursor.execute("SELECT COUNT(*) FROM %s" % name)
Just curious: Can we insert into the table? Is this table expected to support 
all operations?


-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6560/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 483:it can be created directly in Kudu and then loaded as an 
external table.
if this is harmless/useful functionality, should impala also support it?


-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has submitted this change and it was merged.

Change subject: IMPALA-5140: improve docs building guidelines
..


IMPALA-5140: improve docs building guidelines

Move docs/generatingImpalaDoc.md to docs/README.md. This will
automatically render the document inline at places like:

https://github.com/apache/incubator-impala/tree/master/docs

under the directory listing.

Fix existing markdown which wasn't always rendering properly. Remove
unneeded HTML and backslashes. Add a mention of make, and add one
troubleshooting tip. Wrap most lines at 90 chars. This does not change
how Github renders the markdown, and it makes reading the source easier
as well.

Explain how to get dita into PATH.

Emphasize make and remove some of the dita advanced usage.

Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Reviewed-on: http://gerrit.cloudera.org:8080/6512
Reviewed-by: Jim Apple 
Tested-by: Michael Brown 
---
A docs/README.md
D docs/generatingImpalaDoc.md
2 files changed, 204 insertions(+), 73 deletions(-)

Approvals:
  Michael Brown: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
why allow this in addition to the '... sort by ()' clause?


Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition 
must not contain " +
same here, why allow the sort.by.columns table property?


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
why not print that as a 'sort by' clause?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5154: Handle 'unpartitioned' Kudu tables

2017-04-05 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6560

Change subject: IMPALA-5154: Handle 'unpartitioned' Kudu tables
..

IMPALA-5154: Handle 'unpartitioned' Kudu tables

The catalogd was hanging trying to load an unpartitioned
Kudu table created outside of Impala. This fixes an
assumption made in KuduTable.java that the list of
'partition by' expressions is not empty. Regardless, the
list on the thrift structure must be created because the
field is marked required.

Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
---
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M tests/query_test/test_kudu.py
2 files changed, 34 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/6560/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6560
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I40926bf6ea46cfca518bba6d4ca13fb5b0de358d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

2017-04-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
..


Abandoned

Replaced by https://gerrit.cloudera.org/#/c/6559/

-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6559

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..

IMPALA-3742: Partitions and sort INSERTs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow
and leading to timeouts.

We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions and sorts
rows according to Kudu's partitioning scheme for INSERTs
and UPSERTs. A followup patch will handle UPDATE and DELETE.

It accomplishes this by inserting an exchange node and a sort
node into the plan before the operation. Both the exchange and
the sort are given a KuduPartitionExpr which takes a row and
calls into the Kudu client to return its partition number.

Testing:
- Updated planner tests.
- Ran the Kudu functional tests.
- Ran performance tests demonstrating that we can now handle much
  larger inserts without having timeouts.

Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-context.h
M be/src/exprs/expr.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Exprs.thrift
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
21 files changed, 312 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 36:   offset: long = 0 (id: 0);
Why a default value? Seems potentially dangerous.

Not for this change, but I'm thinking we don't even need to store the offset. 
If we know the block size, than we can derive the offset (assuming the list of 
file blocks is ordered by offset). Might be worth adding a TODO or recording 
that idea somewhere.


Line 40:   length: long = -1 (id: 1);
Seems redundant, why keep it?


Line 65:   compression: FbCompression (id: 3);
Will FlatBuffers add padding to align members? Ideally, we'd optimize for space 
and not access speed.


http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
Nice


Line 296:   10: optional list file_name_prefixes
Is this required for the move to flat buffers? I think we should consider an 
HdfsPathSet abstraction that assigns path ids and internally compresses the 
underlying strings. There's a lot of manual lookups and stitching in the 
current code. I don't feel too strongly about whether we should do that now, or 
clean up the code later.


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFbCompression() {
toFlatBuffer()?


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 126: locations = fileSystem.getFileBlockLocations(fileStatus, 0, 
fileStatus.getLen());
I think it would be better for now to keep all the code that fetches 
information from external systems in one place (HdfsTable). Splitting up the 
loading and delegating to several classes may make sense, but that probably 
requires significant surgery, and the current fetching code is very much 
centralized (we iterate over all files of in a table).

The loading code in this patch is more confusing to me than before. The meaning 
of some verbs like load/create is less clear.

If you agree with that direction, we may not need a FileDescriptor class at 
all, and can only rely on the FB to hold the data. It may still make sense to 
have a FileDescBuilder which you can use to construct a FbFileDesc.


Line 227: loc.getNames().length);
Another case where we might be calling out to the NN.


Line 321: private static int REPLICA_HOST_CACHE_MASK = 0x8000;
Less code and more readable to have one var with (1 << 15). The places that 
need to & the mask can bit-wise invert, i.e. (x & ~MASK).


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 189:   // List of file name prefixes.
Sorted list of file name prefixes?


Line 309: Pair compressedFileName =
How predictable is the compression behavior? Do we iterate over the files in 
lexicographical order for both HDFS and S3?
I'm worried about the case where an "invalidate metadata" suddenly leads to a 
higher memory requirement even if no files/partitions have changed.


Line 340:* the suffix is equal to 'fileName'.
Should mention that this function expects the fileNames to be sorted.


Line 346: String commonPrefix = Strings.commonPrefix(prevFileName, 
fileName);
I'm wondering if it's better to first check the last common prefix instead of 
the prefix between the current and prev file name. In the current version it 
seems like the list of prefixes could contain strings which are a common prefix 
of another one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6512/8/docs/README.md
File docs/README.md:

PS8, Line 168: 5
> not 8?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Hello Laurel Hale,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6512

to look at the new patch set (#9).

Change subject: IMPALA-5140: improve docs building guidelines
..

IMPALA-5140: improve docs building guidelines

Move docs/generatingImpalaDoc.md to docs/README.md. This will
automatically render the document inline at places like:

https://github.com/apache/incubator-impala/tree/master/docs

under the directory listing.

Fix existing markdown which wasn't always rendering properly. Remove
unneeded HTML and backslashes. Add a mention of make, and add one
troubleshooting tip. Wrap most lines at 90 chars. This does not change
how Github renders the markdown, and it makes reading the source easier
as well.

Explain how to get dita into PATH.

Emphasize make and remove some of the dita advanced usage.

Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
---
A docs/README.md
D docs/generatingImpalaDoc.md
2 files changed, 204 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6512/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS9, Line 376: if (!isHBaseTable) 
Instead of checking this here, it's a bit cleaner to alway call 
analyzeSortByColumns() and move the check there.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS9, Line 49: Lists.newArrayList()
The caller is not supposed to modify the return values, correct? You may just 
return Collections.emptyList() which is also immutable.


PS9, Line 77: Lists.newArrayList();
Same comment as above.


PS9, Line 89: Lists.newArrayList();
You can still construct and return an ImmutableList. See Guava's 
ImmutableList.builder()


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE 
LOCATION '/a/b' " +
  : "SORT BY(bar)");
You may want to comment why this results in a parsing error.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6512/8/docs/README.md
File docs/README.md:

PS8, Line 168: 5
not 8?


-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] save

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: save
..


Patch Set 1:

> I think this should have been squashed.

Sorry, mistake.

-- 
To view, visit http://gerrit.cloudera.org:8080/6540
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9d33f82ed6661e849b2a61f3b6dd2039813bca7c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] save

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has abandoned this change.

Change subject: save
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/6540
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I9d33f82ed6661e849b2a61f3b6dd2039813bca7c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 3:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6535/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS3, Line 381: the total number
 : // of fragment instances
> what's that?
Done


PS3, Line 386: i32
> Types.TFragmentIdx ?
Done


PS3, Line 412: 6: list destinations
> isn't that common across all instances?
Done


PS3, Line 414: fragment
> instance?
Done


PS3, Line 435: coord_state_idx
> not clear why the backend will care what index it was represented by in a c
Done


Line 441:   4: list fragment_ctxs
> are these in any particular order?
i explained the ordering for fragment_instance_ctxs


Line 453: // ReportExecStatus
> other RPC names have a blank line after them, which makes them easier to fi
Done


PS3, Line 552: overall
> what does "overall" mean here?  this makes it seem like it's the merged sta
changed code to match description


PS3, Line 560: insert_exec_status
> I guess this will hold the insert status for all instances, right?
rephrased


Line 563:   7: optional map error_log;
> and same here. this will be the aggregated error log across all instances?
rephrased


PS3, Line 572: CancelPlanFragment
> CancelQueryFInstances
Done


Line 669: 
> It would be helpful to add the RPC comment to show which RPC the following 
Done


Line 699: 
> // PublishFilter
Done


PS3, Line 734: ReportExecStatus
> Eventually this will report status for all the instances (at least in the p
well, it's overall backend execution status (this could report an error even 
before any fragment instances are started), so that's why i thought 'exec 
status' is still the right scope


Line 739:   TCancelQueryFInstancesResult CancelQueryFInstances(
i didn't want to name this CancelQuery, because that sounds too much like a 
client-related rpc


-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#4).

Change subject: IMPALA-2550: Switch to per-query exec rpc
..

IMPALA-2550: Switch to per-query exec rpc

Coordinator:
- FragmentInstanceState -> BackendState, which in turn records
  FragmentInstanceStats

QueryState
- does query-wide setup in a separate thread (which also launches
  the instance exec threads)
- has a query-wide 'prepared' state at which point all static setup
  is done and all FragmentInstanceStates are accessible

Also renamed QueryExecState to ClientRequestState.

Simplified handling of execution status (in FragmentInstanceState):
- status only transmitted via ReportExecStatus rpc
- in particular, it's not returned anymore from the Cancel rpc

FIS: Fixed bugs related to partially-prepared state (in Close() and 
ReleaseThreadToken())

Change-Id: I20769e420711737b6b385c744cef4851cee3facd
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/codegen-anyval.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exprs/expr-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
A be/src/runtime/coordinator-backend-state.cc
A be/src/runtime/coordinator-backend-state.h
A be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
A be/src/runtime/debug-options.cc
A be/src/runtime/debug-options.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
D be/src/runtime/plan-fragment-executor.cc
D be/src/runtime/plan-fragment-executor.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/service/CMakeLists.txt
M be/src/service/child-query.cc
M be/src/service/child-query.h
R be/src/service/client-request-state.cc
R be/src/service/client-request-state.h
M be/src/service/fe-support.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/fault-injection-util.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/ImpalaInternalService.thrift
66 files changed, 3,820 insertions(+), 3,615 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-04-05 Thread John Russell (Code Review)
Hello Laurel Hale, Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6515

to look at the new patch set (#3).

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..

IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

In addition to switching URLs for individual JIRAs
to point to issues.apache.org, also construct alternative
URLs for JIRA reports (usually 1 per release, summarizing
the issues fixed in that release).

There is some inconsistency in which releases have associated
JIRA reports. For releases where we never published a link to
a JIRA report, I added a  tag that could be used to
construct a report in future, but didn't fill in any URL.

Some releases do have a link to a JIRA report in the release
notes, however the report is empty. Possibly there was some
strangeness around the release process for those, or there
could be some missing info in the JIRA system. I just
transcribed the links in those cases and didn't try to
reconstruct the history to debug the empty reports.

Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_fixed_issues.xml
M docs/topics/impala_known_issues.xml
3 files changed, 232 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/6515/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 8:

Laurel and Jim, please see patch set 8 and as always 
https://github.com/mikesbrown/incubator-impala/tree/fix-docs/docs for the 
rendered version.

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6512/5/docs/README.md
File docs/README.md:

PS5, Line 74: * **To generate HTML output of the Impala SQL Reference, run 
the following command:**
: 
: ```
: ./bin/dita -input  -format 
html5 \
:   -output  \
:   -filter 
: ```
: 
: * **To generate PDF output of the Impala SQL Reference, run 
the following command:**
: 
: ```
: ./bin/dita -input  -format 
pdf \
:   -output  \
:   -filter 
: ```
> Laurel was the original author. I'll add her to the review.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Michael Brown (Code Review)
Hello Laurel Hale,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6512

to look at the new patch set (#8).

Change subject: IMPALA-5140: improve docs building guidelines
..

IMPALA-5140: improve docs building guidelines

Move docs/generatingImpalaDoc.md to docs/README.md. This will
automatically render the document inline at places like:

https://github.com/apache/incubator-impala/tree/master/docs

under the directory listing.

Fix existing markdown which wasn't always rendering properly. Remove
unneeded HTML and backslashes. Add a mention of make, and add one
troubleshooting tip. Wrap most lines at 90 chars. This does not change
how Github renders the markdown, and it makes reading the source easier
as well.

Explain how to get dita into PATH.

Emphasize make and remove some of the dita advanced usage.

Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
---
A docs/README.md
D docs/generatingImpalaDoc.md
2 files changed, 201 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6512/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports

2017-04-05 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA 
reports
..


Patch Set 2: Code-Review+1

All of my issues have been addressed. Thank you.

-- 
To view, visit http://gerrit.cloudera.org:8080/6515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines

2017-04-05 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-5140: improve docs building guidelines
..


Patch Set 7:

> > Do we want to maintain the
 > > "make" command so users can simple 1) install the DITA-OT; 2)
 > > navigate to the "docs" directory; and 3) execute "make" to build
 > > HTML and PDF? I think this is the best way to go.
 > 
 > Thanks. That means I have another patch set to do: adding dita to
 > PATH. I plan to remove all but 1 dita invocation example.

That sounds right.

-- 
To view, visit http://gerrit.cloudera.org:8080/6512
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 8:

(18 comments)

Thank you for the review, please see PS9.

http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS7, Line 984: Sor
> There is no KW_SET, right? Update the name?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS7, Line 73: 
> t?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2484:* Add a warning that will be displayed to the user. Ignores null 
messages. Once
> Comment that no warning can be issued if warnings have been retrieved?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS7, Line 119: ortByColumns_ !
> I think this can throw a NPE. Looking at the parser I see that CreateTableL
Thanks for catching this. I was able to trigger the NPE with an additional test 
in ToSqlTest.java. Fixed.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS7, Line 296: 
> inline
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS7, Line 41: 
:   /**
:* Analyzes the 'sort.by.columns' property in 'tblProperties' 
against the columns of
:* 'table'.
:*/
:   public static List analyzeSortByColumns(Map tblProperties,
:   Table table) throws AnalysisException {
: if (!tblProperties.containsKey(TBL_PROP_SORT_BY_COLUMNS)) {
:   return Lists.newArrayList();
: }
: 
: List sortByCols = Lists.newArrayList(
: Splitter.on(",").trimResults().omitEmptyStrings().split(
: tblProperties.get(TBL_PROP_SORT_BY_COLUMNS)));
: return TablePropertyAnalyzer.analyzeSortByColumns(sortByCols, 
table);
:   }
: 
> These are used in only one place, right? Maybe inline there?
I removed the first one. The second one is used in two places: 
InsertStmt.analyzeSortByColumns() (that's where I inlined the first one), and 
AlterTableSetTblProperties.analyze(). I'd keep the second one unless you lean 
strongly towards inlining it.


Line 76: }
> Precondition check for HBase?
Done


PS7, Line 127: 
 : 
> You may want to consider if it's better to have a getSortByColumn(Table tab
Inlined it instead.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS7, Line 539: // If the table has a 'sort.by.columns' property and the query 
has a 'noclustered'
 : // hint, we issue a warning during analysis and ignore the 
'noclustered' hint.
> That comment doesn't seem relevant here. Remove?
I put it there to explain the intent of the if statement. Without it one might 
think that the noclusteredhint gets ignored by mistake (see previous review 
comment by Thomas). Should I remove/rephrase it?


PS7, Line 541: nsertStmt.hasClusteredHint() || 
> If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClust
True, fixed.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS7, Line 1793: !params.sort_by_columns.isEmpty()
> use isEmpty()
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 526:   String long_property_key = "";
> Test for HBase and Kudu?
Added tests for Kudu in testAlterKuduTable(). HBase tables do not support 
"ALTER TABLE SET" altogether, which is tested in L683.


Line 1054: 
> Same here (HBase + Kudu). You need to put all Kudu tests in a function and 
Added test to check that HBase is not supported. Added Kudu tests to 
TestAlterKuduTable().


PS7, Line 1904: te t
> "must" typo
Done


PS7, Line 1920: est
> No need for that comment :)
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS7, Line 

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#9).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,112 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-1726: Treat parquet ENUMs as STRINGs when creating tables.

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-1726: Treat parquet ENUMs as STRINGs when creating 
tables.
..


Patch Set 1:

(1 comment)

Thank you for working on this. Have you already thought about how we could test 
this? You can have a look at 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test,
 but I'm unsure whether we have a parquet file with an ENUM in our testdata 
already. Let me know if you get stuck there and we can figure out how to add 
one.

http://gerrit.cloudera.org:8080/#/c/6550/1//COMMIT_MSG
Commit Message:

PS1, Line 7: IMPALA-1726
IMPALA-2525?


-- 
To view, visit http://gerrit.cloudera.org:8080/6550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jakub Kukul 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6442/1//COMMIT_MSG
Commit Message:

Line 11: queue. Very often we create a thin wrapper for each work item we submit
> The shared_ptr example is illustrative, but does also have some small effec
Thanks for the explanation!


-- 
To view, visit http://gerrit.cloudera.org:8080/6442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes