Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 )
Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. ...................................................................... Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271 PS5, Line 1271: ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT)); Not related to this patch, but I'm confused how do we verify we have advanced the read page here. I think this just verify that UnpinStream won't hit any DCHECKs. Do you have any ideas on this? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300 PS5, Line 1300: TEST_F(SimpleTupleStreamTest, DeferAdvancingReadPage) { Could you add some comments above this? e.g. Test that UnpinStream defer advancing the read page when all rows from the read page are attached to a returned RowBatch but got not enough reservation. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333 PS5, Line 1333: Could you add a comment here that we are adding rows without releasing the read_batch (i.e. read_batch.Reset())? So we will hit not enough reservation and need unpinning the stream to continue. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334 PS5, Line 1334: for (int j = 0; j < 2; ++j) { Could you leave a comment about why do we run this twice? I guess we want to test it when 'stream' is pinned and unpinned. BTW, could you add ASSERT_TRUE(stream.is_pinned()) before the loop? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354 PS5, Line 1354: } Could you add ASSERT_FALSE(stream.is_pinned()) at the end? http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356 PS5, Line 1356: } Do we need read_batch.Reset() as well? http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723 PS3, Line 723: > That is a good point, thanks! Ah yeah, you are right. The page handle is destroyed in AttachBufferToBatch(). Thanks for catching this! http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } Maybe I'm misunderstanding something. This still seems tricky for me. I agree that this can avoid hitting the DCHECK in this JIRA, because we now won't advance to get a read/write page so won't check consistency (i.e. won't find that unused reservation become negative). But in the case when the unused reservation >= last_attached_page_len_, we always advance the read page regardless whether the reader has freed the row batch buffers. The accounting seems wrong in the period after we advance the read page and before the reader frees the attached buffer: If we advance to get a read/write page in NextReadPage() and save default_page_len_ of reservation for a later write page, it seems like "stealing" default_page_len_ worth of space from the unused reservation, and then after the reader frees the attached buffer, we returns the stealed space back to the unused reservation. Although the unused reservation won't be negative so won't hit any DCHECKs in this case, it might be possible that we run out of unused reservation earlier than it should (if somehow the reader doesn't free the buffer for a long time). I wonder if we should detect whether the reader has freed the attached buffer and make the decision by that. Please correct me if I'm misunderstanding anything. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 14 Apr 2021 14:03:59 +0000 Gerrit-HasComments: Yes
