Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271
PS5, Line 1271:     
ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT));
Not related to this patch, but I'm confused how do we verify we have advanced 
the read page here. I think this just verify that UnpinStream won't hit any 
DCHECKs. Do you have any ideas on this?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300
PS5, Line 1300: TEST_F(SimpleTupleStreamTest, DeferAdvancingReadPage) {
Could you add some comments above this? e.g.
Test that UnpinStream defer advancing the read page when all rows from the read 
page are
attached to a returned RowBatch but got not enough reservation.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333
PS5, Line 1333:
Could you add a comment here that we are adding rows without releasing the 
read_batch (i.e. read_batch.Reset())? So we will hit not enough reservation and 
need unpinning the stream to continue.


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334
PS5, Line 1334: for (int j = 0; j < 2; ++j) {
Could you leave a comment about why do we run this twice? I guess we want to 
test it when 'stream' is pinned and unpinned.

BTW, could you add ASSERT_TRUE(stream.is_pinned()) before the loop?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354
PS5, Line 1354:     }
Could you add ASSERT_FALSE(stream.is_pinned()) at the end?


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356
PS5, Line 1356:   }
Do we need read_batch.Reset() as well?


http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723
PS3, Line 723:
> That is a good point, thanks!
Ah yeah, you are right. The page handle is destroyed in AttachBufferToBatch(). 
Thanks for catching this!


http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728
PS5, Line 728:         if (buffer_pool_client_->GetUnusedReservation() < 
last_attached_page_len_) {
             :           // Since attached_to_output_batch is true and 
GetUnusedReservation() is less
             :           // than last_attached_page_len_, the reader is most 
likely has not clean up the
             :           // RowBatch where the read page is attached to. We 
defer advancing the read page
             :           // until the next GetNext() call by the reader (see 
IMPALA-10584).
             :           defer_advancing_read_page = true;
             :         }
Maybe I'm misunderstanding something. This still seems tricky for me. I agree 
that this can avoid hitting the DCHECK in this JIRA, because we now won't 
advance to get a read/write page so won't check consistency (i.e. won't find 
that unused reservation become negative).

But in the case when the unused reservation >= last_attached_page_len_, we 
always advance the read page regardless whether the reader has freed the row 
batch buffers. The accounting seems wrong in the period after we advance the 
read page and before the reader frees the attached buffer:

If we advance to get a read/write page in NextReadPage() and save 
default_page_len_ of reservation for a later write page, it seems like 
"stealing" default_page_len_ worth of space from the unused reservation, and 
then after the reader frees the attached buffer, we returns the stealed space 
back to the unused reservation. Although the unused reservation won't be 
negative so won't hit any DCHECKs in this case, it might be possible that we 
run out of unused reservation earlier than it should (if somehow the reader 
doesn't free the buffer for a long time).

I wonder if we should detect whether the reader has freed the attached buffer 
and make the decision by that.

Please correct me if I'm misunderstanding anything.



--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:03:59 +0000
Gerrit-HasComments: Yes

Reply via email to