[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Reviewed-on: http://gerrit.cloudera.org:8080/13883 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/data-sink.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 21 files changed, 559 insertions(+), 205 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 20 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 19: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 19 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Aug 2019 13:12:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4711/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 19 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Aug 2019 06:36:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4709/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 19 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Aug 2019 04:40:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 19 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Aug 2019 04:40:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 18: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4703/ -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 18 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 17:53:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 18 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 16:19:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4703/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 18 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 16:19:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4091/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 17 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 00:59:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4089/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 16 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 00:57:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 17 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 00:53:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 16: (10 comments) http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@21 PS15, Line 21: #include "util/condition-variable.h" > I'd prefer a forward declaration of this in the header and moving the inclu Done http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@33 PS15, Line 33: s unt > either Done http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@80 PS15, Line 80: produce > I think the polarity of this name doesn't match the other condition variabl Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: while (batch_queue_->IsEmpty() && sender_state_ == SenderState::ROWS_PENDING : && !state->is_cancelled()) { : rows_available_.Wait(l); > Oh no, I think it's generally a good idea to fail hard, just that specific Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127 PS11, Line 127: > Added the call to is_full_.NotifyOne(), agree the early return isn't ideal. Done http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@38 PS15, Line 38: // If the batch is empty, we have nothing to do so just return Status::OK(). > nit: if statement could be one line Done http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@52 PS15, Line 52: > You could Reset() batch at this point to free up any resources associated. The FragmentInstanceState calls Reset() on each batch after it is passed to ::Send - https://github.com/apache/impala/blob/master/be/src/runtime/fragment-instance-state.cc#L365 http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@144 PS15, Line 144: RETURN_IF_ERROR( > I found the mix of NotifyOne()/NotifyAll() is mildly confusing. I think eit Added some comments to clarify this. The suggestion to use NotifyAll() was from prior review comments to use NotifyAll() in Close and FlushFinal to be extra sure that all threads are awoken. http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h File be/src/exec/plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h@94 PS15, Line 94: Updat > UpdateAndCheck ? Done http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h File be/src/runtime/blocking-row-batch-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h@48 PS15, Line 48: The queue supports limiting the capacity in terms of bytes enqueued and number of : /// batches to be enqueued. > Stale ? Done -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 16 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 00:22:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#17). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/data-sink.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 21 files changed, 559 insertions(+), 205 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/17 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 17 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/16/be/src/exec/plan-root-sink.cc File be/src/exec/plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/16/be/src/exec/plan-root-sink.cc@64 PS16, Line 64: Status PlanRootSink::UpdateAndCheckRowsProducedLimit(RuntimeState* state, RowBatch* batch) { line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 16 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 31 Jul 2019 00:16:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#16). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/data-sink.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 21 files changed, 558 insertions(+), 205 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/16 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 16 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 15: Code-Review+1 (7 comments) This mostly looks good, I had a few minor comments but LGTM http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@21 PS15, Line 21: #include "runtime/deque-row-batch-queue.h" I'd prefer a forward declaration of this in the header and moving the include to the .cc if possible (just trying to reduce the web of include dependencies) http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@33 PS15, Line 33: eiher either http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.h@80 PS15, Line 80: is_full_ I think the polarity of this name doesn't match the other condition variables, since it's signalled when the queue becomes not full. I'd recommend naming it something like batch_queue_has_capacity_. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // If the query was cancelled while the sink was waiting for rows to become available, : // or if the query was cancelled before the current call to GetNext, set eos and then : // return. The queue could be empty if the sink was closed > Makes sense. CHECK() seems a bit an overkill. Sorry for the bad suggestion. Oh no, I think it's generally a good idea to fail hard, just that specific scenario was a concern. http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@38 PS15, Line 38: if (batch->num_rows() == 0) { nit: if statement could be one line http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@52 PS15, Line 52: batch->DeepCopyTo(output_batch.get()); You could Reset() batch at this point to free up any resources associated. http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/buffered-plan-root-sink.cc@144 PS15, Line 144: if (*eos) consumer_eos_.NotifyOne(); I found the mix of NotifyOne()/NotifyAll() is mildly confusing. I think either is actually ok in all contexts, since at most 1 thread can wait on each condition variable, which means that NotifyOne() is correct and NotifyAll() does not result in unnecessary wakeups. So we could consider just standardising on one. I think I understand the reasoning because which is used where, but initially I was wondering if there was something subtle going on. Ok to leave code as is, but might help to clarify in the comments about when NotifyOne/NotifyAll is called. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 15 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jul 2019 20:30:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 15: Code-Review+1 (2 comments) Not sure if Tim wants to do another pass ? http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h File be/src/exec/plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/exec/plan-root-sink.h@94 PS15, Line 94: Check UpdateAndCheck ? http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h File be/src/runtime/blocking-row-batch-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/15/be/src/runtime/blocking-row-batch-queue.h@48 PS15, Line 48: Adds the following timers to the RuntimeProfile: 'RowBatchQueueGetWaitTime' and : /// 'RowBatchQueuePutWaitTime' Stale ? -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 15 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jul 2019 07:14:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 15 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 21:03:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > I think we should avoid CHECK on codepaths that you could get to by togglin Makes sense. CHECK() seems a bit an overkill. Sorry for the bad suggestion. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 20:56:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > Yes, its possible if the result cache is enabled, but I think only if the c I think we should avoid CHECK on codepaths that you could get to by toggling a query option (since there's nothing to stop a malicious user from toggling it). It's fine to return an error though. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 20:29:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > Isn't passing different fetch size possible if the result cache is enabled Yes, its possible if the result cache is enabled, but I think only if the client restarts the fetch to the beginning of the result set. I changed it so it returns "Status::Expected(TErrorCode::NOT_IMPLEMENTED_ERROR, ...)" -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 20:25:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#15). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 20 files changed, 550 insertions(+), 205 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/15 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 15 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > > the case in which the total number of result rows is not a multiple of ro Isn't passing different fetch size possible if the result cache is enabled ? In general, this seems to be making very big assumption about what the client will pass in. Not sure if this is exactly a good practice. I suppose my point is that we should fail stop or at least indicate some sort of failures to the clients for this known to be wrong case in the code. CHECK() is the fail-stop approach, which seems acceptable as this sink type is not anticipated to be used. An alternative would be to log this error case and return an error status to the clients. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 19:48:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > the case in which the total number of result rows is not a multiple of row > batch size. That case is still handled properly. The value of num_results is the same as the fetch size configured by any client. For impala-shell, I believe that is 1024, for other drivers such as Simba the docs state that the "RowsFetchedPerBlock" ("The maximum number of rows that a query returns at a time.") is 1. This condition is only hit if a user configures a fetch size under 1024. Otherwise, everything works as expected (no partial results) regardless of the number of rows returned by the actual query. The only way partial results could be returned is if a user sets the fetch size to a value above 1024, fetches rows, sets the fetch size to under 1024, and then fetches rows. However, thats non-default behavior, and a bit of an odd pattern, which is why I decided to defer fixing this to a future patch. Do we want to a CHECK for this? That would cause a crash that affects all Impala users, which is presumably worse than the current behavior? -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 16:25:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > Yeah, we wouldn't publicly tell users to set SPOOL_QUERY_RESULTS=true until I am more worried about the case in which the total number of result rows is not a multiple of row batch size. The client may call GetNext() multiple times. In which case, the client may have incomplete result, which arguably is more confusing than returning no result at all. I am a bit hesitant for checking in like this. At the minimum, we should add a CHECK(false) << "Not implemented"; in the if-stmt body. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Jul 2019 04:58:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4031/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 13 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 19:48:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#13). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 20 files changed, 549 insertions(+), 205 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/13 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 13 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103 PS11, Line 103: rows_available_.NotifyOne(); : consumer_eos_.NotifyOne(); : is_full_.NotifyOne(); > Yes, that's s the assumption now but that could change in the future. Notif Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > May be I am misunderstanding the details but isn't the TODO part a core fun Yeah, we wouldn't publicly tell users to set SPOOL_QUERY_RESULTS=true until this is fixed, but if 'num_results < batch->num_rows()' then the behavior is the same as before this patch, it just sets eos to true and returns (e.g. no more results are returned). -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 18:57:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4026/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 12 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 18:46:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103 PS11, Line 103: DataSink::Close(state); : } : > Yeah, for the slow path I guess it doesn't matter much, but shouldn't there Yes, that's s the assumption now but that could change in the future. NotifyAll() here should work in either cases. It won't be fun debugging why users of this type of sink can sometimes hang. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // If the query was cancelled while the sink was waiting for rows to become available, : // or if the query was cancelled before the current call to GetNext, set eos and then : // return. The queue could be empty if the sink was closed > You do, I just decided to defer this part to a future patch. Added a TODO t May be I am misunderstanding the details but isn't the TODO part a core functionally ? Leaving this part out seems to mean this sink is broken / incomplete. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 12 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 18:42:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h@64 PS11, Line 64: // > nit: /// Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@27 PS11, Line 27: 10 > This should be a static constant Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@54 PS11, Line 54: while (batch_queue_->IsFull()) { : is_full_.Wait(l); : RETURN_IF_CANCELLED(state); : } > May hang for a while if the sink is already cancelled when we get here, it Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103 PS11, Line 103: rows_available_.NotifyOne(); : consumer_eos_.NotifyOne(); : is_full_.NotifyOne(); > I think keeping NotifyAll() make sense for the cancellation path. Same for Yeah, for the slow path I guess it doesn't matter much, but shouldn't there only ever be one thread waiting on any of these condition variables? In which case NotifyOne() should be sufficient? http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { > Not sure I understand this part ? Shouldn't this function still need to ins You do, I just decided to defer this part to a future patch. Added a TODO to make that clear. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127 PS11, Line 127: consumer_eos_.NotifyOne(); > May need to do is_full_.NotifyOne() here in case the sender was blocked. Se Added the call to is_full_.NotifyOne(), agree the early return isn't ideal. I think that is something we can change when we add support for handling fetch requests where 'num_results < batch->num_rows()' http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc@262 PS11, Line 262: ADD_TIMER(parent->runtime_profile(), "RowBatchQueueGetWaitTime"), : ADD_TIMER(parent->runtime_profile(), "RowBatchQueuePutWaitTime")) > The code seems easier to follow if we keep the ADD_TIMER() macro at the old Done http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h@108 PS11, Line 108: get_wait_timer_) > get_wait_timer_ != nullptr Done -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 18:04:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#12). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 20 files changed, 549 insertions(+), 205 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/12 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 12 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h@64 PS11, Line 64: // nit: /// http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@27 PS11, Line 27: 10 This should be a static constant http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@54 PS11, Line 54: while (batch_queue_->IsFull()) { : is_full_.Wait(l); : RETURN_IF_CANCELLED(state); : } May hang for a while if the sink is already cancelled when we get here, it may be more safer to do: while (!state->cancelled() && batch_queue->IsFull()) { is_full_.Wait(l); } RETURN_IF_CANCELLED(state); http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103 PS11, Line 103: rows_available_.NotifyOne(); : consumer_eos_.NotifyOne(); : is_full_.NotifyOne(); I think keeping NotifyAll() make sense for the cancellation path. Same for Close(). NotifyOne() makes assumption about the number of waiters on these condition variables, which seems dicey for the cancellation path and it doesn't hurt to do a NotifyAll() for the slow path. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { Not sure I understand this part ? Shouldn't this function still need to insert up to 'num_results' rows into 'results' in this case ? In other words, if the row batch at the front of the queue is not completely consumed, don't you need to track how many rows have been consumed from it ? http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127 PS11, Line 127: consumer_eos_.NotifyOne(); May need to do is_full_.NotifyOne() here in case the sender was blocked. Seems better to refactor this code (if possible) to always return at line 147. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc@262 PS11, Line 262: ADD_TIMER(parent->runtime_profile(), "RowBatchQueueGetWaitTime"), : ADD_TIMER(parent->runtime_profile(), "RowBatchQueuePutWaitTime")) The code seems easier to follow if we keep the ADD_TIMER() macro at the old call sites and simply reference those timers here. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h@108 PS11, Line 108: get_wait_timer_) get_wait_timer_ != nullptr -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 02:31:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 10: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/4013/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 10 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 01:40:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/4014/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 01:34:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 9: (10 comments) http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@65 PS9, Line 65: rows_available_.NotifyAll(); > It's better to drop the lock before calling notify, so that the woken threa Done. Switched to NotifyOne() as well as per the recommendation in IMPALA-6125 I looked through the other uses of Notify and made similar changes, but I couldn't change all of them because it would introduce race conditions. http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@74 PS9, Line 74: way > wake Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@104 PS9, Line 104: Status BufferedPlanRootSink::GetNext( > I saw a null pointer dereference causing a crash in the function. Emailed S Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@121 PS9, Line 121: *eos = true; > Are you missing a consumer_eos_.NotifyAll() here ? Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h File be/src/exec/plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84 PS9, Line 84: approriate > typo Yeah, I decided just to split this up into two methods since it makes more sense. http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@107 PS9, Line 107: Send() > stale Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc File be/src/runtime/blocking-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc@66 PS9, Line 66: row_batches_put_timer_->Set(batch_queue_->total_put_wait_time()); : row_batches_get_timer_->Set(batch_queue_->total_get_wait_time()); > I believe the TODO has more to do with not using this pattern of setting th Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h File be/src/runtime/deque-row-batch-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@57 PS9, Line 57: remanining > typo. Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@68 PS9, Line 68: int max_batches_; > const Done http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc File be/src/runtime/deque-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc@61 PS9, Line 61: result->Reset(); > I was able to trigger a segfault from a null pointer here by running "selec Done -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 01:01:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 11: > (4 comments) > > I played around with this a little bit and was definitely able to > see the spooling behaviour by suspending impala-shell. I ran into a > couple of crashes though - I think the problems might be around > null row batches being dequeued from the queue Thanks for doing some testing. I recently switched from using a std::queue to using a std:deque and think I got my push / pop functions mixed up. Regardless, the issue is fixed now. I'm able to run "select distinct * from tpch.lineitem limit 1" successfully and cancelling the query midway doesn't cause any crashes. I also looped the tests in https://gerrit.cloudera.org/#/c/13907/ for about an hour and everything passed. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 00:54:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#11). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 20 files changed, 550 insertions(+), 214 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/11 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@61 PS10, Line 61: // Adding a batch should always be successful because the queue should always be open line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@62 PS10, Line 62: // when Send is called, and the call to is_full_.Wait(l) ensures spaces is available. line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/13883/10/be/src/exec/buffered-plan-root-sink.cc@122 PS10, Line 122: // For now, if num_results < batch->num_rows(), we terminate returning results early. line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 10 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 26 Jul 2019 00:51:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#10). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M be/src/util/blocking-queue.h M tests/query_test/test_result_spooling.py 20 files changed, 548 insertions(+), 214 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/10 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 10 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@121 PS9, Line 121: *eos = true; Are you missing a consumer_eos_.NotifyAll() here ? http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h File be/src/exec/plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84 PS9, Line 84: validation rules a May be helpful to spell out the details here. This function seems to be a bit of a misnomer as it does a couple of things: - validate the collection slots - update the number of row batches sent to this sink - check if the number of rows sent to this sink exceeds limit and if so, return error status Not sure what a good name is. ValidateBatchAndCheckLimit() ?? http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@84 PS9, Line 84: approriate typo http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/plan-root-sink.h@107 PS9, Line 107: Send() stale http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc File be/src/runtime/blocking-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/blocking-row-batch-queue.cc@66 PS9, Line 66: row_batches_put_timer_->Set(batch_queue_->total_put_wait_time()); : row_batches_get_timer_->Set(batch_queue_->total_get_wait_time()); I believe the TODO has more to do with not using this pattern of setting the timer at the end. Instead of waiting till the end to set the timer, it may be better for the batch_queue_ to directly use the two timers so we don't have to wait until the scan node is closed before we can tell how much time is spent in the queue. http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h File be/src/runtime/deque-row-batch-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@57 PS9, Line 57: remanining typo. Also this line doesn't quite parse correctly. http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.h@68 PS9, Line 68: int max_batches_; const -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jul 2019 01:15:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 9: (4 comments) I played around with this a little bit and was definitely able to see the spooling behaviour by suspending impala-shell. I ran into a couple of crashes though - I think the problems might be around null row batches being dequeued from the queue http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@65 PS9, Line 65: rows_available_.NotifyAll(); It's better to drop the lock before calling notify, so that the woken thread can always grab the lock immediately. See https://issues.apache.org/jira/browse/IMPALA-6135 http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@74 PS9, Line 74: way wake http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/exec/buffered-plan-root-sink.cc@104 PS9, Line 104: Status BufferedPlanRootSink::GetNext( I saw a null pointer dereference causing a crash in the function. Emailed Sahil the logs. http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc File be/src/runtime/deque-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/13883/9/be/src/runtime/deque-row-batch-queue.cc@61 PS9, Line 61: result->Reset(); I was able to trigger a segfault from a null pointer here by running "select * from tpch.lineitem" from impala-shell and then cancelling the query. C [impalad+0xbd0462] impala::RowBatch::Reset()+0x12 C [impalad+0x116fd74] impala::DequeRowBatchQueue::Close()+0x44 C [impalad+0xf701f5] impala::BufferedPlanRootSink::Close(impala::RuntimeState*)+0x1b5 C [impalad+0xc0d04c] impala::FragmentInstanceState::Close()+0x2c C [impalad+0xc0fde2] impala::FragmentInstanceState::Exec()+0xc2 C [impalad+0xbfb4b7] impala::QueryState::ExecFInstance(impala::FragmentInstanceState*)+0x287 C [impalad+0xe50220] impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*)+0x310 C [impalad+0xe50dca] boost::detail::thread_data, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > > >::run()+0x7a -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jul 2019 00:53:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/7/be/src/exec/buffered-plan-root-sink.cc@38 PS7, Line 38: output_batch->AcquireState(batch); > Thanks for the info Tim. Replaced this DeepCopyTo. I think BufferedTupleStr Yeah BTS always does a full copy of the RowBatch into its own buffers. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jul 2019 00:14:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3982/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Jul 2019 20:33:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3981/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Jul 2019 20:32:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#9). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M tests/query_test/test_result_spooling.py 19 files changed, 517 insertions(+), 194 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/9 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 ) Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/13883/8/be/src/runtime/blocking-row-batch-queue.cc File be/src/runtime/blocking-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/13883/8/be/src/runtime/blocking-row-batch-queue.cc@46 PS8, Line 46: bool BlockingRowBatchQueue::AddBatchWithTimeout(unique_ptr batch, int64_t timeout_micros) { line too long (101 > 90) -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 24 Jul 2019 19:52:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13883 to look at the new patch set (#8). Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl .. IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl Improves the encapsulation of RowBatchQueue by the doing the following re-factoring: * Renames RowBatchQueue to BlockingRowBatchQueue which is more indicitive of what the queue does * Re-factors the timers managed by the scan-node into the BlockingRowBatchQueue implementation * Favors composition over inheritance by re-factoring BlockingRowBatchQueue to own a BlockingQueue rather than extending one The re-factoring lays the groundwork for introducing a generic RowBatchQueue that all RowBatch queues inherit from. Adds a new DequeRowBatchQueue which is a simple wrapper around a std::deque that (1) stores unique_ptr to queued RowBatch-es and (2) has a maximum capacity. Implements BufferedPlanRootSink using the new DequeRowBatchQueue. DequeRowBatchQueue is generic enough that replacing it with a SpillableQueue (queue backed by a BufferedTupleStream) should be straightforward. BufferedPlanRootSink is synchronized to protect access to DequeRowBatchQueue since the queue is not thread safe. BufferedPlanRootSink FlushFinal blocks until the consumer thread has processed all RowBatches. This ensures that the coordinator fragment stays alive until all results are fetched, but allows all other fragments to be shutdown immediately. Testing: * Running core tests * Updated tests/query_test/test_result_spooling.py Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be --- M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h M be/src/exec/scanner-context.cc M be/src/runtime/CMakeLists.txt A be/src/runtime/blocking-row-batch-queue.cc A be/src/runtime/blocking-row-batch-queue.h A be/src/runtime/deque-row-batch-queue.cc A be/src/runtime/deque-row-batch-queue.h D be/src/runtime/row-batch-queue.cc D be/src/runtime/row-batch-queue.h M tests/query_test/test_result_spooling.py 19 files changed, 516 insertions(+), 194 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/13883/8 -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong