[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Reviewed-on: http://gerrit.cloudera.org:8080/13883
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
21 files changed, 559 insertions(+), 205 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 20
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 19: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 19
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Aug 2019 13:12:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-08-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4711/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 19
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Aug 2019 06:36:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4709/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 19
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Aug 2019 04:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 19
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Aug 2019 04:40:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 18: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4703/


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 18
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 17:53:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 18: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 18
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 16:19:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4703/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 18
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 16:19:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4091/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 17
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 00:59:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4089/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 16
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 00:57:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 17
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 00:53:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 16:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@21
PS15, Line 21: #include "util/condition-variable.h"
> I'd prefer a forward declaration of this in the header and moving the inclu
Done


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@33
PS15, Line 33: s unt
> either
Done


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@80
PS15, Line 80:  produce
> I think the polarity of this name doesn't match the other condition variabl
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123: while (batch_queue_->IsEmpty() && sender_state_ == 
SenderState::ROWS_PENDING
  : && !state->is_cancelled()) {
  :   rows_available_.Wait(l);
> Oh no, I think it's generally a good idea to fail hard, just that specific
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127
PS11, Line 127:
> Added the call to is_full_.NotifyOne(), agree the early return isn't ideal.
Done


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@38
PS15, Line 38:   // If the batch is empty, we have nothing to do so just return 
Status::OK().
> nit: if statement could be one line
Done


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@52
PS15, Line 52:
> You could Reset() batch at this point to free up any resources associated.
The FragmentInstanceState calls Reset() on each batch after it is passed to 
::Send - 
https://github.com/apache/impala/blob/master/be/src/runtime/fragment-instance-state.cc#L365


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@144
PS15, Line 144:   RETURN_IF_ERROR(
> I found the mix of NotifyOne()/NotifyAll() is mildly confusing. I think eit
Added some comments to clarify this. The suggestion to use NotifyAll() was from 
prior review comments to use NotifyAll() in Close and FlushFinal to be extra 
sure that all threads are awoken.


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h@94
PS15, Line 94: Updat
> UpdateAndCheck ?
Done


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h
File be/src/runtime/blocking-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h@48
PS15, Line 48:  The queue supports limiting the capacity in terms of bytes 
enqueued and number of
 : /// batches to be enqueued.
> Stale ?
Done



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 16
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 00:22:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#17).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
21 files changed, 559 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/17
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 17
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/16/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/16/be/src/exec/plan-root-sink.cc@64
PS16, Line 64: Status 
PlanRootSink::UpdateAndCheckRowsProducedLimit(RuntimeState* state, RowBatch* 
batch) {
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 16
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 31 Jul 2019 00:16:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#16).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
21 files changed, 558 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/16
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 16
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 15: Code-Review+1

(7 comments)

This mostly looks good, I had a few minor comments but LGTM

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@21
PS15, Line 21: #include "runtime/deque-row-batch-queue.h"
I'd prefer a forward declaration of this in the header and moving the include 
to the .cc if possible (just trying to reduce the web of include dependencies)


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@33
PS15, Line 33: eiher
either


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@80
PS15, Line 80: is_full_
I think the polarity of this name doesn't match the other condition variables, 
since it's signalled when the queue becomes not full. I'd recommend naming it 
something like batch_queue_has_capacity_.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123: // If the query was cancelled while the sink was waiting 
for rows to become available,
  : // or if the query was cancelled before the current call to 
GetNext, set eos and then
  : // return. The queue could be empty if the sink was closed
> Makes sense. CHECK() seems a bit an overkill. Sorry for the bad suggestion.
Oh no, I think it's generally a good idea to fail hard, just that specific 
scenario was a concern.


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@38
PS15, Line 38:   if (batch->num_rows() == 0) {
nit: if statement could be one line


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@52
PS15, Line 52:   batch->DeepCopyTo(output_batch.get());
You could Reset() batch at this point to free up any resources associated.


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@144
PS15, Line 144: if (*eos) consumer_eos_.NotifyOne();
I found the mix of NotifyOne()/NotifyAll() is mildly confusing. I think either 
is actually ok in all contexts, since at most 1 thread can wait on each 
condition variable, which means that NotifyOne() is correct and NotifyAll() 
does not result in unnecessary wakeups. So we could consider just standardising 
on one.

I think I understand the reasoning because which is used where, but initially I 
was wondering if there was something subtle going on.

Ok to leave code as is, but might help to clarify in the comments about when 
NotifyOne/NotifyAll is called.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 15
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jul 2019 20:30:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-30 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 15: Code-Review+1

(2 comments)

Not sure if Tim wants to do another pass ?

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h@94
PS15, Line 94: Check
UpdateAndCheck ?


http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h
File be/src/runtime/blocking-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h@48
PS15, Line 48:  Adds the following timers to the RuntimeProfile: 
'RowBatchQueueGetWaitTime' and
 : /// 'RowBatchQueuePutWaitTime'
Stale ?



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 15
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jul 2019 07:14:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4065/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 15
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 21:03:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> I think we should avoid CHECK on codepaths that you could get to by togglin
Makes sense. CHECK() seems a bit an overkill. Sorry for the bad suggestion.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 20:56:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> Yes, its possible if the result cache is enabled, but I think only if the c
I think we should avoid CHECK on codepaths that you could get to by toggling a 
query option (since there's nothing to stop a malicious user from toggling it). 
It's fine to return an error though.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 20:29:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> Isn't passing different fetch size possible if the result cache is enabled
Yes, its possible if the result cache is enabled, but I think only if the 
client restarts the fetch to the beginning of the result set.

I changed it so it returns "Status::Expected(TErrorCode::NOT_IMPLEMENTED_ERROR, 
...)"



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 20:25:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#15).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
20 files changed, 550 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/15
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 15
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> > the case in which the total number of result rows is not a multiple of ro
Isn't passing different fetch size possible if the result cache is enabled ? In 
general, this seems to be making very big assumption about what the client will 
pass in. Not sure if this is exactly a good practice.

I suppose my point is that we should fail stop or at least indicate some sort 
of failures to the clients for this known to be wrong case in the code. CHECK() 
is the fail-stop approach, which seems acceptable as this sink type is not 
anticipated to be used. An alternative would be to log this error case and 
return an error status to the clients.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 19:48:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-29 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> the case in which the total number of result rows is not a multiple of row 
> batch size.

That case is still handled properly. The value of num_results is the same as 
the fetch size configured by any client. For impala-shell, I believe that is 
1024, for other drivers such as Simba the docs state that the 
"RowsFetchedPerBlock" ("The maximum number of rows that a query returns at a 
time.") is 1.

This condition is only hit if a user configures a fetch size under 1024. 
Otherwise, everything works as expected (no partial results) regardless of the 
number of rows returned by the actual query.

The only way partial results could be returned is if a user sets the fetch size 
to a value above 1024, fetches rows, sets the fetch size to under 1024, and 
then fetches rows. However, thats non-default behavior, and a bit of an odd 
pattern, which is why I decided to defer fixing this to a future patch.

Do we want to a CHECK for this? That would cause a crash that affects all 
Impala users, which is presumably worse than the current behavior?



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 16:25:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> Yeah, we wouldn't publicly tell users to set SPOOL_QUERY_RESULTS=true until
I am more worried about the case in which the total number of result rows is 
not a multiple of row batch size. The client may call GetNext() multiple times. 
In which case, the client may have incomplete result, which arguably is more 
confusing than returning no result at all.

I am a bit hesitant for checking in like this. At the minimum, we should add a 
CHECK(false) << "Not implemented"; in the if-stmt body.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 29 Jul 2019 04:58:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4031/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 19:48:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#13).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
20 files changed, 549 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/13
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103
PS11, Line 103:   rows_available_.NotifyOne();
  :   consumer_eos_.NotifyOne();
  :   is_full_.NotifyOne();
> Yes, that's s the assumption now but that could change in the future. Notif
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> May be I am misunderstanding the details but isn't the TODO part a core fun
Yeah, we wouldn't publicly tell users to set SPOOL_QUERY_RESULTS=true until 
this is fixed, but if 'num_results < batch->num_rows()' then the behavior is 
the same as before this patch, it just sets eos to true and returns (e.g. no 
more results are returned).



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:57:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4026/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:46:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103
PS11, Line 103:   DataSink::Close(state);
  : }
  :
> Yeah, for the slow path I guess it doesn't matter much, but shouldn't there
Yes, that's s the assumption now but that could change in the future. 
NotifyAll() here should work in either cases. It won't be fun debugging why 
users of this type of sink can sometimes hang.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123: // If the query was cancelled while the sink was waiting 
for rows to become available,
  : // or if the query was cancelled before the current call to 
GetNext, set eos and then
  : // return. The queue could be empty if the sink was closed
> You do, I just decided to defer this part to a future patch. Added a TODO t
May be I am misunderstanding the details but isn't the TODO part a core 
functionally ? Leaving this part out seems to mean this sink is broken / 
incomplete.



-- 
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:42:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h@64
PS11, Line 64: //
> nit: ///
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@27
PS11, Line 27: 10
> This should be a static constant
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@54
PS11, Line 54: while (batch_queue_->IsFull()) {
 :   is_full_.Wait(l);
 :   RETURN_IF_CANCELLED(state);
 : }
> May hang for a while if the sink is already cancelled when we get here, it
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103
PS11, Line 103:   rows_available_.NotifyOne();
  :   consumer_eos_.NotifyOne();
  :   is_full_.NotifyOne();
> I think keeping NotifyAll() make sense for the cancellation path. Same for
Yeah, for the slow path I guess it doesn't matter much, but shouldn't there 
only ever be one thread waiting on any of these condition variables? In which 
case NotifyOne() should be sufficient?


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
> Not sure I understand this part ? Shouldn't this function still need to ins
You do, I just decided to defer this part to a future patch. Added a TODO to 
make that clear.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127
PS11, Line 127: consumer_eos_.NotifyOne();
> May need to do is_full_.NotifyOne() here in case the sender was blocked. Se
Added the call to is_full_.NotifyOne(), agree the early return isn't ideal. I 
think that is something we can change when we add support for handling fetch 
requests where 'num_results < batch->num_rows()'


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc@262
PS11, Line 262:   ADD_TIMER(parent->runtime_profile(), 
"RowBatchQueueGetWaitTime"),
  :   ADD_TIMER(parent->runtime_profile(), 
"RowBatchQueuePutWaitTime"))
> The code seems easier to follow if we keep the ADD_TIMER() macro at the old
Done


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h
File be/src/util/blocking-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h@108
PS11, Line 108: get_wait_timer_)
> get_wait_timer_ != nullptr
Done



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:04:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-26 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#12).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
20 files changed, 549 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/12
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h@64
PS11, Line 64: //
nit: ///


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@27
PS11, Line 27: 10
This should be a static constant


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@54
PS11, Line 54: while (batch_queue_->IsFull()) {
 :   is_full_.Wait(l);
 :   RETURN_IF_CANCELLED(state);
 : }
May hang for a while if the sink is already cancelled when we get here, it may 
be more safer to do:

   while (!state->cancelled() && batch_queue->IsFull()) {
is_full_.Wait(l);
   }
   RETURN_IF_CANCELLED(state);


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103
PS11, Line 103:   rows_available_.NotifyOne();
  :   consumer_eos_.NotifyOne();
  :   is_full_.NotifyOne();
I think keeping NotifyAll() make sense for the cancellation path. Same for 
Close(). NotifyOne() makes assumption about the number of waiters on these 
condition variables, which seems dicey for the cancellation path and it doesn't 
hurt to do a NotifyAll() for the slow path.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123
PS11, Line 123:   // For now, if num_results < batch->num_rows(), we 
terminate returning results
  :   // early.
  :   if (num_results > 0 && num_results < batch->num_rows()) {
Not sure I understand this part ? Shouldn't this function still need to insert 
up to 'num_results' rows into 'results' in this case ?

In other words, if the row batch at the front of the queue is not completely 
consumed, don't you need to track how many rows have been consumed from it ?


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127
PS11, Line 127: consumer_eos_.NotifyOne();
May need to do is_full_.NotifyOne() here in case the sender was blocked. Seems 
better to refactor this code (if possible) to always return at line 147.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc@262
PS11, Line 262:   ADD_TIMER(parent->runtime_profile(), 
"RowBatchQueueGetWaitTime"),
  :   ADD_TIMER(parent->runtime_profile(), 
"RowBatchQueuePutWaitTime"))
The code seems easier to follow if we keep the ADD_TIMER() macro at the old 
call sites and simply reference those timers here.


http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h
File be/src/util/blocking-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h@108
PS11, Line 108: get_wait_timer_)
get_wait_timer_ != nullptr



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 02:31:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 10:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/4013/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 01:40:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/4014/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 01:34:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 9:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@65
PS9, Line 65:   rows_available_.NotifyAll();
> It's better to drop the lock before calling notify, so that the woken threa
Done. Switched to NotifyOne() as well as per the recommendation in IMPALA-6125

I looked through the other uses of Notify and made similar changes, but I 
couldn't change all of them because it would introduce race conditions.


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@74
PS9, Line 74: way
> wake
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@104
PS9, Line 104: Status BufferedPlanRootSink::GetNext(
> I saw a null pointer dereference causing a crash in the function. Emailed S
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@121
PS9, Line 121:   *eos = true;
> Are you missing a consumer_eos_.NotifyAll() here ?
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84
PS9, Line 84: approriate
> typo
Yeah, I decided just to split this up into two methods since it makes more 
sense.


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@107
PS9, Line 107: Send()
> stale
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc
File be/src/runtime/blocking-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc@66
PS9, Line 66:   
row_batches_put_timer_->Set(batch_queue_->total_put_wait_time());
:   
row_batches_get_timer_->Set(batch_queue_->total_get_wait_time());
> I believe the TODO has more to do with not using this pattern of setting th
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h
File be/src/runtime/deque-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@57
PS9, Line 57: remanining
> typo.
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@68
PS9, Line 68:   int max_batches_;
> const
Done


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc
File be/src/runtime/deque-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc@61
PS9, Line 61: result->Reset();
> I was able to trigger a segfault from a null pointer here by running "selec
Done



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 01:01:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 11:

> (4 comments)
 >
 > I played around with this a little bit and was definitely able to
 > see the spooling behaviour by suspending impala-shell. I ran into a
 > couple of crashes though - I think the problems might be around
 > null row batches being dequeued from the queue

Thanks for doing some testing. I recently switched from using a std::queue to 
using a std:deque and think I got my push / pop functions mixed up. Regardless, 
the issue is fixed now. I'm able to run "select distinct * from tpch.lineitem 
limit 1" successfully and cancelling the query midway doesn't cause any 
crashes. I also looped the tests in https://gerrit.cloudera.org/#/c/13907/ for 
about an hour and everything passed.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 00:54:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#11).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
20 files changed, 550 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/11
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@61
PS10, Line 61:   // Adding a batch should always be successful because the 
queue should always be open
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@62
PS10, Line 62:   // when Send is called, and the call to is_full_.Wait(l) 
ensures spaces is available.
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@122
PS10, Line 122:   // For now, if num_results < batch->num_rows(), we 
terminate returning results early.
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jul 2019 00:51:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-25 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#10).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M be/src/util/blocking-queue.h
M tests/query_test/test_result_spooling.py
20 files changed, 548 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/10
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@121
PS9, Line 121:   *eos = true;
Are you missing a consumer_eos_.NotifyAll() here ?


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84
PS9, Line 84: validation rules a
May be helpful to spell out the details here. This function seems to be a bit 
of a misnomer as it does a couple of things:
- validate the collection slots
- update the number of row batches sent to this sink
- check if the number of rows sent to this sink exceeds limit and if so, return 
error status

Not sure what a good name is. ValidateBatchAndCheckLimit() ??


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84
PS9, Line 84: approriate
typo


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@107
PS9, Line 107: Send()
stale


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc
File be/src/runtime/blocking-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc@66
PS9, Line 66:   
row_batches_put_timer_->Set(batch_queue_->total_put_wait_time());
:   
row_batches_get_timer_->Set(batch_queue_->total_get_wait_time());
I believe the TODO has more to do with not using this pattern of setting the 
timer at the end.

Instead of waiting till the end to set the timer, it may be better for the 
batch_queue_ to directly use the two timers so we don't have to wait until the 
scan node is closed before we can tell how much time is spent in the queue.


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h
File be/src/runtime/deque-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@57
PS9, Line 57: remanining
typo.

Also this line doesn't quite parse correctly.


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@68
PS9, Line 68:   int max_batches_;
const



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jul 2019 01:15:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 9:

(4 comments)

I played around with this a little bit and was definitely able to see the 
spooling behaviour by suspending impala-shell. I ran into a couple of crashes 
though - I think the problems might be around null row batches being dequeued 
from the queue

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@65
PS9, Line 65:   rows_available_.NotifyAll();
It's better to drop the lock before calling notify, so that the woken thread 
can always grab the lock immediately. See 
https://issues.apache.org/jira/browse/IMPALA-6135


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@74
PS9, Line 74: way
wake


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@104
PS9, Line 104: Status BufferedPlanRootSink::GetNext(
I saw a null pointer dereference causing a crash in the function. Emailed Sahil 
the logs.


http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc
File be/src/runtime/deque-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc@61
PS9, Line 61: result->Reset();
I was able to trigger a segfault from a null pointer here by running "select * 
from tpch.lineitem" from impala-shell and then cancelling the query.

C  [impalad+0xbd0462]  impala::RowBatch::Reset()+0x12
C  [impalad+0x116fd74]  impala::DequeRowBatchQueue::Close()+0x44
C  [impalad+0xf701f5]  
impala::BufferedPlanRootSink::Close(impala::RuntimeState*)+0x1b5
C  [impalad+0xc0d04c]  impala::FragmentInstanceState::Close()+0x2c
C  [impalad+0xc0fde2]  impala::FragmentInstanceState::Exec()+0xc2
C  [impalad+0xbfb4b7]  
impala::QueryState::ExecFInstance(impala::FragmentInstanceState*)+0x287
C  [impalad+0xe50220]  impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*)+0x310
C  [impalad+0xe50dca]  boost::detail::thread_data, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > 
>::run()+0x7a



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jul 2019 00:53:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc@38
PS7, Line 38:   output_batch->AcquireState(batch);
> Thanks for the info Tim. Replaced this DeepCopyTo. I think BufferedTupleStr
Yeah BTS always does a full copy of the RowBatch into its own buffers.



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jul 2019 00:14:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3982/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Jul 2019 20:33:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3981/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Jul 2019 20:32:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#9).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M tests/query_test/test_result_spooling.py
19 files changed, 517 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/9
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13883 )

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13883/8/be/src/runtime/blocking-row-batch-queue.cc
File be/src/runtime/blocking-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/13883/8/be/src/runtime/blocking-row-batch-queue.cc@46
PS8, Line 46: bool 
BlockingRowBatchQueue::AddBatchWithTimeout(unique_ptr batch, int64_t 
timeout_micros) {
line too long (101 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Jul 2019 19:52:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

2019-07-24 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13883

to look at the new patch set (#8).

Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and 
BufferedPRS impl
..

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
---
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/plan-root-sink.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/exec/scanner-context.cc
M be/src/runtime/CMakeLists.txt
A be/src/runtime/blocking-row-batch-queue.cc
A be/src/runtime/blocking-row-batch-queue.h
A be/src/runtime/deque-row-batch-queue.cc
A be/src/runtime/deque-row-batch-queue.h
D be/src/runtime/row-batch-queue.cc
D be/src/runtime/row-batch-queue.h
M tests/query_test/test_result_spooling.py
19 files changed, 516 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/8
--
To view, visit http://gerrit.cloudera.org:8080/13883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Gerrit-Change-Number: 13883
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong