[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS6, Line 232: subtrees
> It's really trying to define rules that produce a total lock order for the 
I don't understand that, given that locks need to be acquired from bottom to 
top if their trackers have that relationship, no? I'll look at the new comment 
to see if it helps.


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 9: Code-Review+2

carry +2, rebased

-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/567/

-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has submitted this change and it was merged.

Change subject: Bump Kudu version to 7533364
..


Bump Kudu version to 7533364

Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Reviewed-on: http://gerrit.cloudera.org:8080/6797
Tested-by: Impala Public Jenkins
Reviewed-by: Matthew Jacobs 
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

PS6, Line 230: common_ancestor
> this is named based on the state it holds after the loop, whereas the path_
Done


Line 242:   //trackers under 'other', down to 'this'.
> in this case, common_ancestor == other, right?
Done


Line 245:   //under 'this', down to 'other'.
> likewise
Done


Line 260: locks.emplace_back(tracker->lock_);
> missing braces
Done


Line 289:   if (common_ancestor == this) {
> isn't this case 2?
Done


PS6, Line 294: Case 2
> case 1?
Done


http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS6, Line 213: bool
> static, or make it a method on subtree1
Done


PS6, Line 232: subtrees
> is this about locking subtrees, or paths?
It's really trying to define rules that produce a total lock order for the 
whole tree. Expanded the explanation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-5238: transfer reservations between trackers
..

IMPALA-5238: transfer reservations between trackers

This is a primitive needed to implement claiming and distribution
of initial reservations. It supports transferring reservation between
any two ReservationTrackers under the same query.

Also remove the public DecreaseReservation() method, which is now
mostly redundant and we have no plans to use.

Testing:
* Added a test that exercises transfer between trackers at different
  levels and with different relationships.

Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
3 files changed, 293 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5220: memory maintenance cleanup
..


IMPALA-5220: memory maintenance cleanup

Remove logic that tries to release pages from TcMalloc's page heap:
TCMalloc's behaviour changed so that it automatically does this with
"aggressive decommit" and committed spans can't accumulate in the page
heap.

Also increase the memory maintenance interval - 1s is too aggressive and
can free memory that will be imminently reused by a running query, e.g.
particularly buffer pool buffers.

Testing:
Tried to reproduce IMPALA-2800 in a couple of ways:
* Ran a big aggregation locally and cancelled it
* Looked at memz/ of some live clusters (production and stress test).
In all cases "Bytes in page heap freelist" was 0.

This confirms that IMPALA-2800 was already solved, probably
by the gperftools 2.5 upgrade, where aggressive decommit
would mean that memory is released to the system in
free() instead of the ReleaseFreeMemory() callst.

I was able to confirm that the ReleaseFreeMemory() calls were unnecessary
to avoid retaining memory by running a couple of stress tests
locally with this patch and checking that "Bytes in page
heap freelist" was 0 after the change and that memory consumption
was generally sensible.

Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Reviewed-on: http://gerrit.cloudera.org:8080/6626
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/common/init.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/util/memory-metrics.h
5 files changed, 25 insertions(+), 89 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#11).

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..

IMPALA-5169: Add support for async pins in buffer pool

Makes Pin() do async reads behind-the-scenes, instead of
blocking until the read completes. The blocking is done
instead when the client tries to access the buffer via
PageHandle::GetBuffer() or ExtractBuffer().

This is implemented with a new sub-state of "pinned"
where the page has a buffer and consumes reservation
but the buffer does not contain valid data.

Motivation:
This unlocks various opportunities to overlap read I/Os
with other work:
* Reads to different disks can execute in parallel
* I/O and computation can be overlapped.

This initially benefits BufferedTupleStream::PinStream(),
where many pages are pinned at once. With this change the
reads run asynchronously. This can potentially lead
to large speedups when spilling. E.g. if the pages for a Hash
Join's partition are spread across 10 disks, we could get 10x
the read throughput, plus overlap the I/O with hash table build.

In future we can use this to do read-ahead over unpinned
BufferedTupleStreams or for unpinned Runs in Sorter, but
this requires changes to the client code to Pin() pages
in advance.

Testing:
* BufferedTupleStreamV2 already exercises this.
* Various BufferPool tests already exercise this.
* Added a basic test to cover edge cases made possible by the
  new state transitions.
* Extended the randomised test to cover this.

Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
---
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
8 files changed, 423 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..


Patch Set 10:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 411:   RETURN_IF_ERROR(read_page_->handle.GetBuffer(_buffer));
> don't we have to set the flag here too?
Good point. Moved the flag updating into a function in the Page object so it's 
easier to keep consistent.


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 155: If the stream was previously unpinned, the page's data
  : /// may not yet be in memory - the stream must call 
GetBuffer() (which can block on
  : /// I/O or fail with an I/O error) once the data is needed.
> This seems like more of an implementation detail hidden behind the BufferPo
I think it's relevant that it can block on I/O or fail, but some of the details 
arent.


PS10, Line 373: data_in_mem
> the "in mem" is really a detail hidden behind the BP abstraction.  How abou
Done


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool-internal.h
File be/src/runtime/bufferpool/buffer-pool-internal.h:

PS10, Line 41: async_pin_in_flight
> since all pins are async, how about just 'pin_in_flight'?
Done


PS10, Line 107: CancelAsyncPin
> CancelPinRead(), or CancelPinReadIO, or CancelReadIO? i.e. this doesn't rea
I inlined the logic into the callers (since it wouldn't be clear what this did 
aside from call CancelRead() otherwise).


PS10, Line 236: UndoAsyncPin
> How about sticking with the list-oriented names, so something like: UndoMov
Done


PS10, Line 241: async_write_in_flight
> wrong name
Done


PS10, Line 247: write_
> same
Done


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

Line 157: Status status = ExtractBuffer(client, handle, );
> if the read had already finished before CancelAsyncPin() and had an error, 
Yes, there's no way the error status could propagate. I think this should be 
more obvious with CancelAsyncPin() inlined.


PS10, Line 384: UndoAsyncPin
> Sticking with the list-oriented names, this is really UndoInFlightMoveToPin
Done


PS10, Line 387: CancelAsyncPin
> page->CancelAsyncPin() isn't really canceling the pin, it's cancelling the 
Yeah more-or-less. Inlined CancelAsyncPin() to avoid having to name the thing.


PS10, Line 393: it doesn't have valid data.
> is that true? the read may have completed, right? I think this routine shou
Added a comment up the top explaining what the intended end-state is. Hopeful 
that's clearer.


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.h
File be/src/runtime/bufferpool/buffer-pool.h:

PS10, Line 413:  
> previously
Done


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

PS10, Line 139: Even if the read fails, it is valid to call ReadAsync() again
  : /// after this returns.
> what does that mean?
Reworded to be a bit less cryptic, hopefully.

I wanted to make it clear that it was valid to retry ReadAsync() even if the 
read failed. In a lot of other places it's implicitly assumed that things 
shouldn't be retried once an error occurs so I wanted to document that this was 
an exception.

We don't depend on this right now since we just cancel reads when we don't care 
about result, but in an earlier iteration of the patch I was waiting on reads 
instead of cancelling them and it was important because we might retry them 
later.


-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6707/3/www/query_plan.tmpl
File www/query_plan.tmpl:

PS3, Line 42: {{?plan_metadata_unavailable}}
:  Query plan information is not yet available since the 
planning hasn't finished.
: {{/plan_metadata_unavailable}}
: 
: {{^plan_metadata_unavailable}}
Since the webpage refreshes every few seconds, it would be best to have this 
say something like "Plan not yet available. Page will update when query 
planning completes".


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG
Commit Message:

Line 22: and run parallel queries.
> Load isn't relevant. The lock dependency is the problem, so as long as you 
And can you first just try to reproduce the problem manually by doing those 
steps:

1) Executes a query and simulates getting stuck in planning (loop forever).
2) see if the debug webpage is stuck. It should be since you're holding a QES 
lock.
3) Execute a second query. Now this query should not even be able to register 
since the webserver is holding the map lock (while waiting for the QES::lock of 
#1).


http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

PS3, Line 722: ate->query_status();
not sure if that's what we should put. Let's see what Henry suggests.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG
Commit Message:

Line 22: - As an exception, we don't grab QES::lock_ while the query planning is
> Not much success on this front. I'm not able to repro the scenario via test
Load isn't relevant. The lock dependency is the problem, so as long as you hold 
the map lock, you should be able to demonstrate the problem.

Just to be clear, you tried to reproduce this using the test but without the 
fixes, right?


http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS3, Line 603:  // For queries in CREATED state, the profile information 
isn't populated yet.
 :   if (exec_state->query_state() == 
beeswax::QueryState::CREATED) return Status::OK();
> Wanted to double check if this change is required. Reason being, with this 
Is it possible to get to this method for a query that hasn't returned from 
ExecStatement() yet? Where does the query_id come from along the paths that 
call into here?

Also, what info is in the profile up to this point?


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 8: Code-Review+2

(1 comment)

Merged in related refactoring patch which also had a +2:
https://gerrit.cloudera.org/#/c/6510/11

http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 193:   /// Nanoseconds are rounded to the nearest microsecond supported by 
Impala. Returns
> i don't know when that's going to happen. at least leave a todo in the impl
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-10 Thread Matthew Jacobs (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6526

to look at the new patch set (#8).

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..

IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP

Adds Impala support for TIMESTAMP types stored in Kudu.

Impala stores TIMESTAMP values in 96-bits and has nanosecond
precision. Kudu's timestamp is a 64-bit microsecond delta
from the Unix epoch (called UNIXTIME_MICROS), so a conversion
is necessary.

When writing to Kudu, TIMESTAMP values in nanoseconds are
averaged to the nearest microsecond.

When reading from Kudu, the KuduScanner returns
UNIXTIME_MICROS with 8bytes of padding so Impala can convert
the value to a TimestampValue in-line and copy the entire
row.

Testing:
Updated the functional_kudu schema to use TIMESTAMPs instead
of converting to STRING, so this provides some decent
coverage. Some BE tests were added, and some EE tests as
well.

TODO: Support pushing down TIMESTAMP predicates
TODO: Support TIMESTAMPs in range partitioning expressions

Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
---
M be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
A be/src/runtime/timestamp-value.inline.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/util/bloom-filter.h
M be/src/util/dict-test.cc
M be/src/util/promise.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/datasets/functional/functional_schema_template.sql
A 
testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts-abort-on-error.test
A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M tests/query_test/test_kudu.py
M tests/query_test/test_queries.py
42 files changed, 704 insertions(+), 335 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6526/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#5).

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..

IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention

Holding query_exec_state_map_lock_ and QES::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
QES::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding query_exec_state_map_lock_.

While the actual fix for this is to not hold the QES::lock_ during
metadata load, this patch doesn't do it since it could have some unknown
side effects. Instead,

- We make sure that QES::lock_ doesn't block query_exec_state_map_lock_
  in hot paths.
- As an exception, we don't grab QES::lock_ while the query planning is
  in progress.

Also, this patch removes locking inside the GetQueryExecState() method.
It is confusing that some callers choose to lock it inside the method
and wait on it later. Now, the locking is the responsibility of the
callers.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 144 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6849

Change subject: IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL
..

IMPALA-5137: Support TIMESTAMPs in Kudu range predicate DDL

Adds support in DDL for timestamps in Kudu range partition syntax.

Range bounds are converted to Kudu UNIXTIME_MICROS during
analysis.

Testing: Adds FE and EE tests.

Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff
---
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
5 files changed, 153 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6849/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6849
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu
..

IMPALA-5137: Support pushing TIMESTAMP predicates to Kudu

This change builds on the support for reading and writing
TIMESTAMP columns to Kudu tables (see [1]), adding support
for pushing TIMESTAMP predicates to Kudu for scans.

Binary predicates and IN list predicates are supported.

Testing: Added some planner and EE tests to validate the
behavior.

1: https://gerrit.cloudera.org/#/c/6526/

Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea
---
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
7 files changed, 157 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6789/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6789
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#4).

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..

IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention

Holding query_exec_state_map_lock_ and QES::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
QES::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding query_exec_state_map_lock_.

While the actual fix for this is to not hold the QES::lock_ during
metadata load, this patch doesn't do it since it could have some unknown
side effects. Instead,

- We make sure that QES::lock_ doesn't block query_exec_state_map_lock_
  in hot paths.
- As an exception, we don't grab QES::lock_ while the query planning is
  in progress.

Also, this patch removes locking inside the GetQueryExecState() method.
It is confusing that some callers choose to lock it inside the method
and wait on it later. Now, the locking is the responsibility of the
callers.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 139 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

PS6, Line 230: common_ancestor
this is named based on the state it holds after the loop, whereas the path_to_* 
are named based on the state before the loop. I'd raname those to 
path_to_common or something to be consistent.


Line 242:   //trackers under 'other', down to 'this'.
in this case, common_ancestor == other, right?


Line 245:   //under 'this', down to 'other'.
likewise


Line 260: locks.emplace_back(tracker->lock_);
missing braces


Line 289:   if (common_ancestor == this) {
isn't this case 2?


PS6, Line 294: Case 2
case 1?


http://gerrit.cloudera.org:8080/#/c/6708/6/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS6, Line 213: bool
static, or make it a method on subtree1


PS6, Line 232: subtrees
is this about locking subtrees, or paths?


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 23:

(10 comments)

Thanks for the review, please see PS24.

http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 336: nonterminal ArrayList opt_ident_list, opt_sort_cols;
> opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd.
Done


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

Line 43
> drop "columns", the statement doesn't include a COLUMNS
Done


Line 73
> rephrase "sort by columns" as "sort columns" universally
Done


Line 91
> maybe move this into TableDef as well? it seems kind of arbitrary to put it
Done. This has moved a couple of times now. :)


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 863: // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
> is this going to happen soon?
Yes, immediately after this change makes it in. We should remove the hint 
before releasing 2.9, since it hasn't been released yet. We could remove it in 
this change at the risk of the change becoming even larger than it already is.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 46: 
> update
Done


Line 343: if (options_.location != null) {
> why final?
To express that it's just a shorter name for the constant. Removed it.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: final String sortByKey = 
AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS;
> it feels like that constant should live somewhere else
I've discussed that with Alex and Dimitris and we no better place has emerged, 
though I've tried some of them. It's related to the sort by clause, similar to 
how TBL_PROP_SKIP_HEADER_LINE_COUNT lives in HdfsTable.java. Do you have a 
preference where it should go?

I also thought of TableDef (where the analysis happens) or 
AlterTableSetTblProperties (where we deal with properties).


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 456:* the caller must make sure that the value matches any columns 
that were added to the
> or "that were added to the table" to avoid the gender reference :)
Done


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 101: addTestTable("create table test_sort_by.t (id int, int_col int, " 
+
> why call this alltypes? it clearly doesn't contain all types.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6707/2//COMMIT_MSG
Commit Message:

PS2, Line 7: yExecState::lock_ contention
> let's not actually solve that problem right now. it doesn't seem necessary 
Done


Line 11: The most common occurrence of this is while loading the webpage of a 
query
> Yes, but please double check this is safe. From code inspection, I believe 
Ran a stress test and its green (Thanks Mike B.). Also I checked the callers 
manually and it looked ok to me too.


Line 18: side effects. Instead,
> As stated above, let's not do that.
Done


Line 22: - As an exception, we don't grab QES::lock_ while the query planning is
> As we discussed, please see if you can add a test that:
Not much success on this front. I'm not able to repro the scenario via tests. I 
added code to inject sleep simulating table loading in the exec request and 
loops that keep polling the web UI, but I'm not able to get the test failing on 
the asf-master (without this patch). May be I'm not generating enough load for 
lock contention to happen. Thoughts? (I've included the test in PS2).


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

Line 197:   {
> Let's take the QES::lock_ here and not do the promise thing.
Done.


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

Line 741: lock_guard l(*exec_state->lock());
> same
Done


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

Line 715: if (exec_state.get() != nullptr) {
> please fill in those details.
Done


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS2, Line 814: 
 :   (*exec_state)->query_events()->MarkEvent("Query submitted");
 : 
Changes undone as discussed.


http://gerrit.cloudera.org:8080/#/c/6707/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS3, Line 603:  // For queries in CREATED state, the profile information 
isn't populated yet.
 :   if (exec_state->query_state() == 
beeswax::QueryState::CREATED) return Status::OK();
Wanted to double check if this change is required. Reason being, with this 
change, the profile doesn't show up on the web UI till the metadata is loaded. 
That might affect some users/clients relying on the web UI profiles to track 
the query state?


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS2, Line 296: 
 :   /// goto http://msdn.microsoft.com/en-us/library/ms714687.aspx
> I think this comment just adds confusion.  It's up to the QueryExecState cl
Done


http://gerrit.cloudera.org:8080/#/c/6707/2/be/src/service/query-exec-state.h
File be/src/service/query-exec-state.h:

PS2, Line 244: 
> we won't need that.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has abandoned this change.

Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors
..


Abandoned

merging with https://gerrit.cloudera.org/#/c/6526/

-- 
To view, visit http://gerrit.cloudera.org:8080/6510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention

2017-05-10 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention
..

IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention

Holding query_exec_state_map_lock_ and QES::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a query
while the query planning is still in progress. Since we hold the
QES::lock_ during planning, it blocks the web page from loading which inturn
blocks incoming queries by holding query_exec_state_map_lock_.

While the actual fix for this is to not hold the QES::lock_ during
metadata load, this patch doesn't do it since it could have some unknown
side effects. Instead,

- We make sure that QES::lock_ doesn't block query_exec_state_map_lock_
  in hot paths.
- As an exception, we don't grab QES::lock_ while the query planning is
  in progress.

Also, this patch removes locking inside the GetQueryExecState() method.
It is confusing that some callers choose to lock it inside the method and
wait on it later. Now, the locking is the responsibility of the callers.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
---
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_query_concurrency.py
M www/query_plan.tmpl
7 files changed, 139 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/6707/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#24).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,401 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/24
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..


Patch Set 10:

(14 comments)

This public abstraction looks good now. Mostly some renaming suggestions to 
clarify the internal interfaces.

http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

Line 411:   RETURN_IF_ERROR(read_page_->handle.GetBuffer(_buffer));
don't we have to set the flag here too?


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS10, Line 155: If the stream was previously unpinned, the page's data
  : /// may not yet be in memory - the stream must call 
GetBuffer() (which can block on
  : /// I/O or fail with an I/O error) once the data is needed.
This seems like more of an implementation detail hidden behind the BufferPool 
abstraction. Does the BTS or its clients really care about that?


PS10, Line 373: data_in_mem
the "in mem" is really a detail hidden behind the BP abstraction.  How about 
"retrieved_buffer", or "referenced_buffer", or "buffer_referenced"?


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool-internal.h
File be/src/runtime/bufferpool/buffer-pool-internal.h:

PS10, Line 41: async_pin_in_flight
since all pins are async, how about just 'pin_in_flight'?


PS10, Line 107: CancelAsyncPin
CancelPinRead(), or CancelPinReadIO, or CancelReadIO? i.e. this doesn't really 
do anything about pinning (that's handled by the impl_ level).


PS10, Line 236: UndoAsyncPin
How about sticking with the list-oriented names, so something like: 
UndoMoveEvictedToPinned() or UndoInFlightMoveToPinned()?


PS10, Line 241: async_write_in_flight
wrong name


PS10, Line 247: write_
same


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

Line 157: Status status = ExtractBuffer(client, handle, );
if the read had already finished before CancelAsyncPin() and had an error, does 
this still always return ok?


PS10, Line 384: UndoAsyncPin
Sticking with the list-oriented names, this is really 
UndoInFlightMoveToPinned(), right?


PS10, Line 387: CancelAsyncPin
page->CancelAsyncPin() isn't really canceling the pin, it's cancelling the read 
I/O, right?

This routine is really the thing that "cancels" the in-progress pinning.


PS10, Line 393: it doesn't have valid data.
is that true? the read may have completed, right? I think this routine should 
be explained more in terms of the write handle (scratch disk space) is still 
guaranteed to reflect the page contents.  The buffer is in an unknown state 
(may or may not have the page's data) but a reference to it definitely didn't 
escape).


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/bufferpool/buffer-pool.h
File be/src/runtime/bufferpool/buffer-pool.h:

PS10, Line 413:  
previously


http://gerrit.cloudera.org:8080/#/c/6612/10/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

PS10, Line 139: Even if the read fails, it is valid to call ReadAsync() again
  : /// after this returns.
what does that mean?


-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to 7533364

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bump Kudu version to 7533364
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/566/

-- 
To view, visit http://gerrit.cloudera.org:8080/6797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#23).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,405 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/23
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

On jenkins.impala.io gerrit-verify-dryrun jobs,
free-pool-test started running out of memory after updating
Kudu to a newer version which can have slightly higher
memory requirements (Kudu will reject writes when mem is 80%
full rather than 60% full). free-pool-test can allocate up
to 12gb, and the VMs have a CommitLimit of only 16gb.

While larger VMs could be used, or the minicluster could be
tuned further, free-pool-test can also be modified to
reduce the actual RSS usage. Instead of memsetting the
entire allocation in the test, we only scribble the first
byte. This reduces the max RSS from 14gb to 88mb in some
local tests.

Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Reviewed-on: http://gerrit.cloudera.org:8080/6842
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/free-pool-test.cc
1 file changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4499: Table name missing from exec summary
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4499: Table name missing from exec summary
..


IMPALA-4499: Table name missing from exec summary

For scan nodes, previously only HDFS tables showed the name
of the table in the 'Detail' section for the scan node. This
change adds the table name for all scan node types (Kudu,
HBase, and DataSource).

Testing:
- Added an e2e test in test_observability.

Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Reviewed-on: http://gerrit.cloudera.org:8080/6832
Reviewed-by: Thomas Tauber-Marshall 
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M tests/query_test/test_observability.py
3 files changed, 38 insertions(+), 14 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, but someone else must approve
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..


IMPALA-5294: Kudu INSERT partitioning fails with constants

An INSERT into a Kudu table with a constant value being inserted
into a partition column causes an IllegalStateExcaption. This is
because DistributedPlanner removes constants from the list of
partition exprs before creating the KuduPartitionExpr, but
KuduPartitionExpr expects to get one expr per partition column.

The fix is to pass the full list of partition exprs into the
KuduPartitionExpr, instead of the list that has had constants
removed. This preserves the behavior that if all of the partition
exprs are constant we fall back to UNPARTITIONED.

One complication is that if a partition expr is a NullLiteral, it
must be cast to a specific type to be passed to the BE. The
InsertStmt will cast the partition exprs to the partition column
types, but these casts may be lost from the copies of the partition
exprs stored by the KuduPartitionExpr during reset(). To fix this,
the KuduPartitionExpr can store the types of the partition cols and
recast the partition exprs to those types during analyze().

Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Reviewed-on: http://gerrit.cloudera.org:8080/6828
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
3 files changed, 47 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..

IMPALA-5085: large rows in BufferedTupleStreamV2

The stream defaults to pages of default_page_len_. If a row doesn't
fit in that page, it will allocate another page up to max_page_len_
bytes and append a single row to that page, then immediately advance
to the next page. This means that when writing a stream, the large
page only needs to be kept in memory temporarily, which helps with
memory requirements.  E.g. consider a hash join that is repartitioning
1 unpinned stream into 16 unpinned streams. We will need
default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because
when processing a large row we only need one large write buffer at a
time.

The large row cases are not as optimised for memory consumption or
performance - queries processing very large numbers of large rows
are an extreme edge case that is likely to hit other performance
bottlenecks first. Pages with large rows can have up to 50%
internal fragmentation. I tried to avoid adding empty pages to the
stream, but it was tricky to avoid in the case of a read/write
stream where the read iterator was already pointing at the write
page - in that case we can end up with an empty page.

To avoid duplicating more logic between AddRow() and AllocateRow()
I restructured things so that AddRowSlow() is implemented in terms
of AllocateRowSlow(). AllocateRow() now takes a function as an
argument to populate the row.

Testing:
* Added tests for the case where 0 rows are added to the stream
* Extend BigRow to exercise the new code.
* Also test large strings and read/write streams.

Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
---
M be/src/runtime/buffered-tuple-stream-v2-test.cc
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/buffered-tuple-stream-v2.inline.h
M common/thrift/generate_error_codes.py
5 files changed, 544 insertions(+), 235 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6638/3//COMMIT_MSG
Commit Message:

PS3, Line 16: buffers
> buffer bytes? bytes of reservation?
Done


PS3, Line 16: 14
> why is that not 15 (15 pages for writing to the partitions that aren't the 
Done


http://gerrit.cloudera.org:8080/#/c/6638/3/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS3, Line 95: other
> aren't there n-1 other streams?
Done


PS3, Line 212: max_page_len
> how will that be determined?
This is just adding the mechanism, so policy is TBD. We'd discussed adding a 
query option to specify the maximum row size.


PS3, Line 254: stream could not increase the reservation
 :   ///   sufficiently
> does it try this for large rows, or not? above it says the caller must have
I'm not sure where the "above" is. 

This is true regardless for the pinned case, because there's no guarantee or 
requirement that we can extend a stream with a pinned page.

I tried to clarify further on line 262 the behaviour with a large row and an 
unpinned stream.


PS3, Line 280: const WriteRowFn& write_fn
> why is this part exposed outside BTS2? i thought this was just to reuse cod
AllocateRow() is still used by PAGG. The motivation for this choice is 
partially to avoid duplication, but also the AllocateRow() interface needed to 
change regardless because we need to advance to the next page immediately after 
the row is written. The old interface assumes that nothing needs to be done 
after the caller writes the row into the stream.

The alternative is to have the code duplication and expose a pair of methods 
like AllocateRow()/AllocateRowDone(), with PAGG responsible for calling 
AllocateRowDone() after it writes the data.


Line 466:   /// hard limit on the maximum size of row that can be stored in the 
stream.
> motivation for this limit?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

2017-05-10 Thread anujphadke (Code Review)
anujphadke has posted comments on this change.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6527/2//COMMIT_MSG
Commit Message:

PS2, Line 12: ProcessSpit
nit: typo


http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 34: #include "util/pretty-printer.h"
unused.


-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

Line 237:   DCHECK_EQ(path_to_root.back(), other_path_to_root.back());
> maybe first dcheck they both aren't empty to support calling back()
Seems like overkill to me since FindPathToRoot() always includes the current 
tracker by definition. I can add it if you feel strongly.


PS5, Line 253:   
> nit: indentation mismatch
Done


Line 274: if (i == other_path_to_root.size() - 1 && curr->parent_ != 
nullptr) {
> this path could use a quick explanation
We actually don't need it - it was only needed to offset an increment in 
DecreaseReservationInternal(). I was able to remove this and the increment.


Line 278:   }
> why is it okay to add the reservation into the other chain before subtracti
There is a race if you do that but the bug would be that the caller was 
misusing the API and creating a semantic race. E.g. if two threads call 
IncreaseReservationToFit() and TransferReservationTo() concurrently, and 
TransferReservationTo() runs second, the post-condition of 
IncreaseReservationToFit() can become false. Or if AllocateFrom(x) and 
TransferReservationTo(y) are called concurrently and the current reservation is 
< x - y, then the one that runs second will DCHECK.

It's safe in that a concurrent thread can't get memory that it wouldn't be 
entitled to, because we did the increase before the decrease. A concurrent 
thread could probably notice an anomaly if it was looking for one, but I can't 
think of any scenario where that is actually a problem in practice. 

It is tricky to reason about - I tried augmenting the lock ordering so that we 
can lock everything at the same time and therefore fully guarantee atomicity. 
Seemed to work out ok so going with that now.


Line 281: DecreaseReservation(bytes, false, path_to_root.back());
> is this always safe while holding 'locks'?  Why can't we get into a deadloc
Yeah that was violating the lock order - I removed the scoped block without 
thinking about that.


http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS5, Line 194: Decrase
> typo
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-5238: transfer reservations between trackers
..

IMPALA-5238: transfer reservations between trackers

This is a primitive needed to implement claiming and distribution
of initial reservations. It supports transferring reservation between
any two ReservationTrackers under the same query.

Also remove the public DecreaseReservation() method, which is now
mostly redundant and we have no plans to use.

Testing:
* Added a test that exercises transfer between trackers at different
  levels and with different relationships.

Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
3 files changed, 282 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created
..


Patch Set 4: Code-Review+1

(2 comments)

> How about a rebase?

Done

http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS3, Line 403:   std::vector 
sorted_master_addresses(master_addresses);
 :   sort(sorted_master_addresses.begin(), 
sorted_master_addresses.end());
 :   std::string master_addr_concat = join(sorted_master_addresses, 
",");
 :   lock_guard l(kudu_client_map_lock_);
> can't you just use the copy constructor?
Done


http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

PS3, Line 193: across
> spelling
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6792

to look at the new patch set (#4).

Change subject: IMPALA-5167: Reduce the number of Kudu clients created
..

IMPALA-5167: Reduce the number of Kudu clients created

Creating Kudu clients is very expensive as each will fetch
metadata from the Kudu master, so we should minimize the
number of Kudu clients that get created.

This patch stores a map from Kudu master addressed to Kudu
clients in the ExecEnv to be used across the BE for all
queries, and a map in KuduUtil to be used acress the FE
and catalog for all queries.

This relies on changes on the Kudu side that clear
non-covered range entries from the client's cache on
table open (fdc022fe6231af20e307012d98c35b16cbfa7b33
and d07ecd6ded01201c912d2e336611a6a941f48d98)

Testing:
- Ran a stress test locally: scan of a Kudu table, 100
  concurrent queries, load on the Kudu master was reduced
  signficantly, from ~25% cpu to ~4% cpu.
  TODO: perf testing on a cluster.
- Ran the Kudu e2e tests.
- Manually ran a test with concurrent INSERTs and
  'ALTER TABLE ADD PARTITION' (which is affected by the
  Kudu side change mentiond above) and verified
  correctness.

Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
---
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
12 files changed, 114 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6792/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

Line 156:   friend class AndPredicate;
> The goal is to hide both Init() and Get*Val() from irrelevant classes to en
Removed AggregationNode and PartitionedAggregationNode from friend class list 
in PS13.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 6:

(6 comments)

quick comments

http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 92:   initial_ranges_issued_(false),
you can also initialize that inline in the .h file


http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.h
File be/src/runtime/disk-io-mgr-handle-cache.h:

Line 105: std::unique_ptr fh_;
no trailing _ for structs


http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.h
File be/src/runtime/disk-io-mgr-handle-cache.h:

Line 116: MapType cache_;
provide comments for variables, in particular invariants (e.g., it looks like 
the lru list only contains unused handles)


http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.inline.h
File be/src/runtime/disk-io-mgr-handle-cache.inline.h:

Line 33:   if (hdfs_file_ != nullptr && fs_ != nullptr) {
could these be null?


http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.inline.h
File be/src/runtime/disk-io-mgr-handle-cache.inline.h:

Line 61:   int index = hash()(key) % NUM_PARTITIONS;
also: use something out of util/hash-util.h


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 1309
> I use this as the key for the map, so switching the hash alone wouldn't eli
probably a good idea, it's unlikely that we'll see a lot of overwritten files.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/565/

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 11: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-10 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#6).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default
2. Share the file handle between scan ranges from the same
file

The scan range no longer maintains its own Hdfs file
handle. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

Scan ranges that are accessing data cached by Hdfs
will get a file handle from the cache, but the file
handle will be kept on the scan range for the time
that the scan range is in use. This prevents the
cache from closing the file handle while the data
buffer is in use.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0, file handle
caching is off and the previous behavior applies.

Tests:
test_hdfs_fd_caching.py copies the files from an existing
table into a new directory and uses that to create an
external table. It queries the external table, then
uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". In
the delete case, we expect zero rows. In the move case,
we expect the same number of rows.

TODO:
1. Determine appropriate defaults.
2. Other tests
  a. File overwrite
  b. Any others?
3. For scan ranages that use Hdfs caching, should there
be some sharing at the scanner level?

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/query_test/test_hdfs_fd_caching.py
11 files changed, 679 insertions(+), 368 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-10 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 4:

(4 comments)

Fixed the test comments and fixed some errors in the code found by other 
existing tests (custom_cluster/test_hdfs_fd_caching.py tests the metrics).

http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 1331:   ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L);
> I'm splitting hairs about the exact meaning of cached. 
I found an existing test that was checking the metrics, and I've changed this 
code to match what that test was expecting.

IO_MGR_NUM_FILE_HANDLES_OUTSTANDING is the number of file handles in use.
IO_MGR_NUM_CACHED_FILE_HANDLES now is the count of all files handles. It would 
also make sense for this to exclude the oustanding file handles, but right now, 
this counts all file handles.


http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py
File tests/query_test/test_hdfs_fd_caching.py:

Line 76: # File deleted: either # rows is the same or # rows = 0
> we don't make any guarantees for this, so might as well not check the resul
Changed to not check the results.


Line 98: new_table_location = 
"{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database)
> long lines
Done


Line 105: # Delete the whole directory (including data file)
> this function is pretty much identical to the one above (and below), except
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: 
sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
> Personally, I prefer to show what is actually being executed in the explain
sum_zero_if_empty sounds pretty obscure, what is that supposed to mean?


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 22:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 382:   // Optional list of sort by columns for the new table. If 
specified, these will override
"sort columns" or "sorting columns" (as in: partition columns)


Line 385:   9: optional list sort_by_columns
sort_columns


Line 439:   16: optional list sort_by_columns
same


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 336: nonterminal ArrayList opt_ident_list, opt_sort_by_cols;
opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

Line 43: public class AlterTableSortByColumnsStmt extends AlterTableStmt {
drop "columns", the statement doesn't include a COLUMNS


Line 73: // Disallow setting sort by columns on HBase and Kudu tables.
rephrase "sort by columns" as "sort columns" universally


Line 91:   public static void analyzeSortByColumns(List sortByCols, 
Table table)
maybe move this into TableDef as well? it seems kind of arbitrary to put it 
here.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 863: // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
is this going to happen soon?


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 46:  * correspond to the following clauses in a CREATE TABLE statement:
update


Line 343: final String sortByKey = 
AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS;
why final?


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: final String sortByKey = 
AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS;
it feels like that constant should live somewhere else


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 456:* the caller must make sure that the value matches any columns 
he/she added to the
or "that were added to the table" to avoid the gender reference :)


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 101: addTestTable("create table test_sort_by.alltypes (id int, int_col 
int, " +
why call this alltypes? it clearly doesn't contain all types.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4499: Table name missing from exec summary
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/564/

-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/563/

-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/562/

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 3:

fwiw the jenkins nodes (both public and private) appear to have vm_overcommit 0

ubuntu@ip-172-31-7-2:~$ cat /proc/sys/vm/overcommit_memory 
0

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created
..


Patch Set 3:

How about a rebase?

-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 3: Code-Review+2

> I measured the max rss w/ this change to be 88mb, down from 14gb is
 > what I observed with the old code.

it might still fail with vm_overcommit = 2 
(https://www.kernel.org/doc/Documentation/vm/overcommit-accounting) but I agree 
this is the best way out.

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

Line 237:   DCHECK_EQ(path_to_root.back(), other_path_to_root.back());
maybe first dcheck they both aren't empty to support calling back()


PS5, Line 253:   
nit: indentation mismatch


Line 274: if (i == other_path_to_root.size() - 1 && curr->parent_ != 
nullptr) {
this path could use a quick explanation


Line 278:   }
why is it okay to add the reservation into the other chain before subtracting 
it from this chain? i guess we are assuming that this function does not execute 
concurrently with anything that might consume reservations against the 
hierarchy, right? is that documented?


Line 281: DecreaseReservation(bytes, false, path_to_root.back());
is this always safe while holding 'locks'?  Why can't we get into a deadlock if 
doing a->Transfer(b) and b->Transfer(a) concurrently?


http://gerrit.cloudera.org:8080/#/c/6708/5/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS5, Line 194: Decrase
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6842

to look at the new patch set (#3).

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..

IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

On jenkins.impala.io gerrit-verify-dryrun jobs,
free-pool-test started running out of memory after updating
Kudu to a newer version which can have slightly higher
memory requirements (Kudu will reject writes when mem is 80%
full rather than 60% full). free-pool-test can allocate up
to 12gb, and the VMs have a CommitLimit of only 16gb.

While larger VMs could be used, or the minicluster could be
tuned further, free-pool-test can also be modified to
reduce the actual RSS usage. Instead of memsetting the
entire allocation in the test, we only scribble the first
byte. This reduces the max RSS from 14gb to 88mb in some
local tests.

Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
---
M be/src/runtime/free-pool-test.cc
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 2:

I measured the max rss w/ this change to be 88mb, down from 14gb is what I 
observed with the old code.

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 2:

Looks good. I will let Dan +2 it.

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..

IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

On jenkins.impala.io gerrit-verify-dryrun jobs,
free-pool-test started running out of memory after updating
Kudu to a newer version which can have slightly higher
memory requirements (Kudu will reject writes when mem is 80%
full rather than 60% full). free-pool-test can allocate up
to 12gb, and the VMs have a CommitLimit of only 16gb.

While larger VMs could be used, or the minicluster could be
tuned further, it doesn't seem like free-pool-test needs to
be testing such large allocations.

Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
---
M be/src/runtime/free-pool-test.cc
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc
File be/src/runtime/free-pool-test.cc:

PS1, Line 132: 1LL
> Will 2LL be too much ? I was hoping we could exercise the code for 64-bit a
we can keep this as is


PS1, Line 138: memset(p1, 1, size);
> Should be sufficient to replace it with *p1 = 1;
Good idea, this works well


-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5297: Set Kudu minicluster memory limit
..


Patch Set 1:

posted to test gerrit-verify-dryrun

-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6844

Change subject: IMPALA-5297: Set Kudu minicluster memory limit
..

IMPALA-5297: Set Kudu minicluster memory limit

By default, Kudu assumes it has 80% of system memory which
is far too high for the minicluster. This sets a mem limit
of 1gb.

Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
---
M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc
File be/src/runtime/free-pool-test.cc:

PS1, Line 138: memset(p1, 1, size);
> That's probably true, depending on vm_overcommit setting. But given that we
Should be sufficient to replace it with *p1 = 1;


-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 10: Code-Review+2

(1 comment)

We should test this on a range of Linux distributions (SLES, Centos, Ubuntu, 
etc) before committing just in case the OpenSSL script change causes any issues.

http://gerrit.cloudera.org:8080/#/c/5715/10/be/src/kudu/util/kudu_export.h
File be/src/kudu/util/kudu_export.h:

Line 1: 
Apache header?


-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc
File be/src/runtime/free-pool-test.cc:

PS1, Line 138: memset(p1, 1, size);
> I wonder if we will be less prone to the problem if we don't actually memse
That's probably true, depending on vm_overcommit setting. But given that we are 
seeing the OOM killer, yes, this would probably be an alternate workaround.


-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.

2017-05-10 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build.
..


Patch Set 10:

Thanks for the review, Tim - do you have any further comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/5715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6842/1/be/src/runtime/free-pool-test.cc
File be/src/runtime/free-pool-test.cc:

PS1, Line 132: 1LL
Will 2LL be too much ? I was hoping we could exercise the code for 64-bit 
allocation size.


PS1, Line 138: memset(p1, 1, size);
I wonder if we will be less prone to the problem if we don't actually memset() 
the entire chunk of memory.


-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5238: transfer reservations between trackers
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6708/3//COMMIT_MSG
Commit Message:

PS3, Line 7: 5239
> wrong number
Done


http://gerrit.cloudera.org:8080/#/c/6708/3/be/src/runtime/bufferpool/reservation-tracker.cc
File be/src/runtime/bufferpool/reservation-tracker.cc:

PS3, Line 204: void ReservationTracker::DecreaseReservation(int64_t bytes) {
> does this interface still make sense? i.e. do we need both this and Transfe
We still need this internally to implement Close() and TransferReservationTo(). 

This does do something distinct from either - it releases reservation on all 
ancestors without closing it. TransferReservationTo() could only be used to 
release reservation up to the root.

I don't that is really necessary though for now, so I removed the public 
interface and updated tests to use alternative APIs.


PS3, Line 253: * 'other' is a descendent of 'this'. 'path_to_root' is empty 
because 'this' is the
 :   //lowest common ancestor. To transfer, we increase the 
reservation on the trackers
 :   //under 'this', down to 'other'.
> isn't this the only case we need for the distribution of initial reservatio
The third case is the one we need to distribute reservation - the initial 
reservation can't be in the query root ReservationTracker since then there's no 
way to prevent it being used for non-initial reservations. It needs to be in a 
separate "initial reservation" tracker that is a child of the query tracker 
(e.g. the distribution would be the same as "aunt"->"child" in the unit test).

Now that I think about it, I think the underlying problem is the same as the 
scenario you mention - some ExecNodes won't claim their reservation until after 
the query has started executing, which means that in the meantime the ExecNodes 
initial reservation needs to be stored somewhere, then transferred.


http://gerrit.cloudera.org:8080/#/c/6708/3/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

PS3, Line 197: Release memory on linked MemTracker
> not sure what that means. should it say "to linked MemTracker"?
Reworded


PS3, Line 199: decrease
> how about reservation_decrease to be clear this is opposite TryConsumeFromM
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-5238: transfer reservations between trackers
..

IMPALA-5238: transfer reservations between trackers

This is a primitive needed to implement claiming and distribution
of initial reservations. It supports transferring reservation between
any two ReservationTrackers under the same query.

Also remove the public DecreaseReservation() method, which is now
mostly redundant and we have no plans to use.

Testing:
* Added a test that exercises transfer between trackers at different
  levels and with different relationships.

Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
3 files changed, 259 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5238: transfer reservations between trackers

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5238: transfer reservations between trackers
..

IMPALA-5238: transfer reservations between trackers

This is a primitive needed to implement claiming and distribution
of initial reservations. It supports transferring reservation between
any two ReservationTrackers under the same query.

Testing:
* Added a test that exercises transfer between trackers at different
  levels and with different relationships.

Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
---
M be/src/runtime/bufferpool/reservation-tracker-test.cc
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
3 files changed, 259 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6708
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..


Patch Set 1:

Looking at git history, the large allocations were added here, so let's be sure 
we're not losing coverage we think we need. Please talk to Michael.

commit ed5ec6772fd24ad28901bc68f120c59597439cf2
Author: Michael Ho 
Date:   Thu Jun 30 15:15:27 2016 -0700

IMPALA-1619: Support 64-bit allocations.

-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS3, Line 403:   std::vector sorted_master_addresses;
 :   sorted_master_addresses.insert(
 :   sorted_master_addresses.begin(), master_addresses.begin(), 
master_addresses.end());
 :   sort(sorted_master_addresses.begin(), 
sorted_master_addresses.end());
can't you just use the copy constructor?


http://gerrit.cloudera.org:8080/#/c/6792/3/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

PS3, Line 193: acress
spelling


-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 285:   boost::scoped_ptr expr_mem_tracker_;
> if this is per querystate, the mempool would need to be thread-safe, not su
Yes, there should be one MemPool per fragment instance. TODO removed. Keep the 
name the same as it's clear enough already.


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#12).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M 

[Impala-ASF-CR] IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

2017-05-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6842

Change subject: IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM
..

IMPALA-5297: Reduce free-pool-test mem requirement to avoid OOM

On jenkins.impala.io gerrit-verify-dryrun jobs,
free-pool-test started running out of memory after updating
Kudu to a newer version which can have slightly higher
memory requirements (Kudu will reject writes when mem is 80%
full rather than 60% full). free-pool-test can allocate up
to 12gb, and the VMs have a CommitLimit of only 16gb.

While larger VMs could be used, or the minicluster could be
tuned further, it doesn't seem like free-pool-test needs to
be testing such large allocations.

Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
---
M be/src/runtime/free-pool-test.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6842/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I31f03e7a4d5d237a1183277c988f85a992396a43
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#11).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M 

[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..


Patch Set 10:

(57 comments)

http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 66: output_tuple_desc_(descs.GetTupleDescriptor(output_tuple_id_)),
> initialize in .h where possible
While I appreciate the suggestion, I prefer not to add more unrelated clean-up 
to this already gigantic patch.


Line 88: desc->type() == grouping_exprs_[i]->type());
> i would intuitively think that the build row is the input row.
renamed to intermediate_row_desc_.


Line 97: 
> it feels dicey to pass a reference to an automatic variable, Init() might h
Done


Line 232:   int count = 0;
> isn't that the same as const X**?
Not really :-).

const X** is pointer to pointer to constant X.
X* const * is pointer to constant pointer to X.

The latter fits the bill of vector::data().


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 122:   /// Output expressions to convert row batches onto output values.
> It's confusing that these aren't used in all subclasses of DataSink (e.g. D
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

PS9, Line 485: 
?


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS9, Line 69: 
> Does this mean that multiple scanner threads share a single MemPool? That d
Nice catch. I hit similar bugs in HDFS scanners before. Those scanners also 
have a per-scanner MemPool.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

PS9, Line 68: :vector materialize_tuple_ doesn't seem to be defined. I think we always need these
Done


Line 69: 
> See my comment about the name in sorter.h.
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h
File be/src/exec/subplan-node.h:

Line 65:   /// tree rooted at 'node' and does any initialization that is 
required as a result of
> * and do any initialization that is required as a result of setting the sub
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

PS9, Line 78: output_tuple_exprs_;
> output_tuple_exprs_? Usually when we have exprs materializing tuples it's n
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc
File be/src/exprs/agg-fn-evaluator.cc:

Line 96: MemPool* mem_pool, AggFnEvaluator** result) {
> move input params to front
Done


Line 161: Status AggFnEvaluator::Open(
> why not const&?
Done


Line 514: 
> single line, here and subsequent functions, where possible
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// and caches all constant input arguments.
> regarding the caching: is there already a todo to move that into the expr s
TODO added.


PS9, Line 94: o, it
> Maybe this should be called ShallowClone() to make it more explicit. Withou
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

Line 68:   }
> scalarexpr does that in create. do the same for this class?
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 87:   for (int i = 0; i < num_children; ++i) {
> we don't use 'const ..' elsewhere for automatic variables.
Done


Line 88: *child_node_idx += 1;
> formatting
Done


Line 92:   return Status::OK();
> what is this referring to?
Stale comment. Removed.


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h
File be/src/exprs/expr.h:

Line 82: 
> this is really CreateChildren, not CreateTree.
Moved in/out params to end.

As for the naming, this function actually builds out the entire expression tree 
rooted at 'root'. CreateChildren() doesn't seem to fit. May be I misunderstood 
your comment.


Line 134:   /// Called recursively to create children expr trees for 
sub-expressions.
> misleading: the root of the created tree is not 'parent' (the root is insta
I refactored this function and CreateTree() a bit so it's easier to follow now.


Line 135:   ///
> "for subexpressions"
Done


Line 140:   ///   root: root of the new tree. Created and initialized by the 
caller.
> let's call this root_idx, makes it easier to follow
It's called child_node_idx in the new patch.


Line 146:   ///   !status.ok() if tree is inconsistent or corrupt
> while we're cleaning up, move in/out params to end
Done


http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

Line 69: 
> initialize vars in .h where practical
Done


Line 82: (*evaluator)->fn_ctxs_ptr_ = (*evaluator)->fn_ctxs_.data();
> why do you need that (for a newly 

[Impala-ASF-CR] IMPALA-5152: Gather all tables with missing metadata in analysis

2017-05-10 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change.

Change subject: IMPALA-5152: Gather all tables with missing metadata in analysis
..


Abandoned

This is temporarily on hold to focus on a cleaner but more invasive solution. 
Abandoning so that nobody spends time on this version.

-- 
To view, visit http://gerrit.cloudera.org:8080/6702
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Id10a303958557982b26a07c97a9bb30932ab44dd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-10 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#10).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M 

[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-10 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6840/1/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test:

Line 20: '2009','1',310,305,1,'24.56KB','NOT CACHED','NOT 
CACHED','TEXT','false',regex:.*
There are a ton of tests that use SHOW TABLE STATS or SHOW PARTITIONS. I have 
not yet gone through all of them to save time. Let's agree on what the exact 
output should look like and then I'll those changes everywhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-10 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6840

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..

IMPALA-2373: Extrapolate row counts for HDFS tables.

The main idea of this patch is to use table stats to
extrapolate the row counts for new/modified partitions.

Existing behavior:
- Partitions that lack the row count stat are ignored
  when estimating the cardinality of HDFS scans. Such
  partitions effectively have an estimated row count
  of zero.
- We always use the row count stats for partitions that
  have one. The row count may be innaccurate if data in
  such partitions has changed significantly.

Summary of changes:
- Enhance COMPUTE STATS to also store the total number
  of file bytes in the table.
- Use the table-level row count and file bytes stats
  to estimate the number of rows in a scan.
- A new impalad startup flag is added to enable/disable
  the extrapolation behavior. The feature is enabled by
  default. Note that even with the feature disabled,
  COMPUTE STATS stores the file bytes so you can enable
  the feature without having to COMPUTE STATS again.

Testing:
- Added new FE unit test
- Added new EE test

Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
---
M be/src/common/global-flags.cc
M be/src/exec/catalog-op-executor.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M tests/metadata/test_compute_stats.py
32 files changed, 1,062 insertions(+), 578 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-5169: Add support for async pins in buffer pool
..

IMPALA-5169: Add support for async pins in buffer pool

Makes Pin() do async reads behind-the-scenes, instead of
blocking until the read completes. The blocking is done
instead when the client tries to access the buffer via
PageHandle::GetBuffer() or ExtractBuffer().

This is implemented with a new sub-state of "pinned"
where the page has a buffer and consumes reservation
but the buffer does not contain valid data.

Motivation:
This unlocks various opportunities to overlap read I/Os
with other work:
* Reads to different disks can execute in parallel
* I/O and computation can be overlapped.

This initially benefits BufferedTupleStream::PinStream(),
where many pages are pinned at once. With this change the
reads run asynchronously. This can potentially lead
to large speedups when spilling. E.g. if the pages for a Hash
Join's partition are spread across 10 disks, we could get 10x
the read throughput, plus overlap the I/O with hash table build.

In future we can use this to do read-ahead over unpinned
BufferedTupleStreams or for unpinned Runs in Sorter, but
this requires changes to the client code to Pin() pages
in advance.

Testing:
* BufferedTupleStreamV2 already exercises this.
* Various BufferPool tests already exercise this.
* Added a basic test to cover edge cases made possible by the
  new state transitions.
* Extended the randomised test to cover this.

Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
---
M be/src/runtime/buffered-tuple-stream-v2.cc
M be/src/runtime/buffered-tuple-stream-v2.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
8 files changed, 416 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 10: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5220: memory maintenance cleanup
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/560/

-- 
To view, visit http://gerrit.cloudera.org:8080/6626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4499: Table name missing from exec summary
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6828/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

Line 249:   // IMPALA-5294: Don't use 'partitionExpr' here because the 
constants were removed.
> explain briefly why the constants matter here (they don't for hdfs)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6828

to look at the new patch set (#3).

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..

IMPALA-5294: Kudu INSERT partitioning fails with constants

An INSERT into a Kudu table with a constant value being inserted
into a partition column causes an IllegalStateExcaption. This is
because DistributedPlanner removes constants from the list of
partition exprs before creating the KuduPartitionExpr, but
KuduPartitionExpr expects to get one expr per partition column.

The fix is to pass the full list of partition exprs into the
KuduPartitionExpr, instead of the list that has had constants
removed. This preserves the behavior that if all of the partition
exprs are constant we fall back to UNPARTITIONED.

One complication is that if a partition expr is a NullLiteral, it
must be cast to a specific type to be passed to the BE. The
InsertStmt will cast the partition exprs to the partition column
types, but these casts may be lost from the copies of the partition
exprs stored by the KuduPartitionExpr during reset(). To fix this,
the KuduPartitionExpr can store the types of the partition cols and
recast the partition exprs to those types during analyze().

Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
---
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
3 files changed, 47 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6828/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5294: Kudu INSERT partitioning fails with constants

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5294: Kudu INSERT partitioning fails with constants
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6828/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

Line 249:   // IMPALA-5294: Don't use 'partitionExpr' here because the 
constants were removed.
explain briefly why the constants matter here (they don't for hdfs)


-- 
To view, visit http://gerrit.cloudera.org:8080/6828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12cbb319f9a5c47fdbfee347b47650186b27f8f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6832

to look at the new patch set (#2).

Change subject: IMPALA-4499: Table name missing from exec summary
..

IMPALA-4499: Table name missing from exec summary

For scan nodes, previously only HDFS tables showed the name
of the table in the 'Detail' section for the scan node. This
change adds the table name for all scan node types (Kudu,
HBase, and DataSource).

Testing:
- Added an e2e test in test_observability.

Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M tests/query_test/test_observability.py
3 files changed, 38 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/6832/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4499: Table name missing from exec summary

2017-05-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4499: Table name missing from exec summary
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6832/1/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

PS1, Line 61: ]
> nit: extra space here and below
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors

2017-05-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS11, Line 282:   
nit: indentation is 4 for line continuation (here and below).


http://gerrit.cloudera.org:8080/#/c/6510/11/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS11, Line 93: TimestampValue t 
You might want to use "const TimestampValue& t =" here (and below), just like 
you did in aggregate-functions-ir.cc


-- 
To view, visit http://gerrit.cloudera.org:8080/6510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No