Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13883 )
Change subject: IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl ...................................................................... Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h File be/src/exec/buffered-plan-root-sink.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.h@64 PS11, Line 64: // nit: /// http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@27 PS11, Line 27: 10 This should be a static constant http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@54 PS11, Line 54: while (batch_queue_->IsFull()) { : is_full_.Wait(l); : RETURN_IF_CANCELLED(state); : } May hang for a while if the sink is already cancelled when we get here, it may be more safer to do: while (!state->cancelled() && batch_queue->IsFull()) { is_full_.Wait(l); } RETURN_IF_CANCELLED(state); http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@103 PS11, Line 103: rows_available_.NotifyOne(); : consumer_eos_.NotifyOne(); : is_full_.NotifyOne(); I think keeping NotifyAll() make sense for the cancellation path. Same for Close(). NotifyOne() makes assumption about the number of waiters on these condition variables, which seems dicey for the cancellation path and it doesn't hurt to do a NotifyAll() for the slow path. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@123 PS11, Line 123: // For now, if num_results < batch->num_rows(), we terminate returning results : // early. : if (num_results > 0 && num_results < batch->num_rows()) { Not sure I understand this part ? Shouldn't this function still need to insert up to 'num_results' rows into 'results' in this case ? In other words, if the row batch at the front of the queue is not completely consumed, don't you need to track how many rows have been consumed from it ? http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/buffered-plan-root-sink.cc@127 PS11, Line 127: consumer_eos_.NotifyOne(); May need to do is_full_.NotifyOne() here in case the sender was blocked. Seems better to refactor this code (if possible) to always return at line 147. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/exec/scan-node.cc@262 PS11, Line 262: ADD_TIMER(parent->runtime_profile(), "RowBatchQueueGetWaitTime"), : ADD_TIMER(parent->runtime_profile(), "RowBatchQueuePutWaitTime")) The code seems easier to follow if we keep the ADD_TIMER() macro at the old call sites and simply reference those timers here. http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h File be/src/util/blocking-queue.h: http://gerrit.cloudera.org:8080/#/c/13883/11/be/src/util/blocking-queue.h@108 PS11, Line 108: get_wait_timer_) get_wait_timer_ != nullptr -- To view, visit http://gerrit.cloudera.org:8080/13883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be Gerrit-Change-Number: 13883 Gerrit-PatchSet: 11 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 26 Jul 2019 02:31:40 +0000 Gerrit-HasComments: Yes
