Dan Hecht has posted comments on this change.

Change subject: IMPALA-3202: implement spill-to-disk in new buffer pool
......................................................................


Patch Set 9:

(33 comments)

Still have more to go (mostly in buffer-pool.cc) but here's another set of 
comments.

http://gerrit.cloudera.org:8080/#/c/5584/9/be/src/runtime/bufferpool/buffer-pool-internal.h
File be/src/runtime/bufferpool/buffer-pool-internal.h:

PS9, Line 31: earlier
earlier than what? not sure what this means exactly.


PS9, Line 48: evicted
rather than define circularly, how about: after the clean page's buffer has 
been reclaimed.


PS9, Line 88: Does not update any
            :   /// reservations or the page's pin count.
given the new naming, I think we can do without this sentence.


PS9, Line 93:   /// Restore an unpinned page to the pinned state. Does not 
update any reservations or
            :   /// the page's pin count.
I think this could use some rewording to be just in terms of client state 
management rather than what it doesn't do.


PS9, Line 99:  without pinned bytes plus dirty unpinned bytes exceeding the
            :   /// client's reservatio
is the reasoning for this described anywhere? it would be good to explain this 
implementation choice briefly.


PS9, Line 114: WriteDirtyPages
how about WriteDirtyPagesAsync()?


Line 116:   /// Wait for the in-flight write for 'page'.
to complete


Line 132:   void CheckConsistency() {
Maybe DCheckConsistency() to make to clear this is debug-only.


PS9, Line 148: //
///


PS9, Line 150: RepinEvictedPage
maybe rename this one too since it's only doing part of the pinning work. maybe 
RestoreEvictedPage()? Or ReadEvictedPage() which would help highlight out that 
the synchronous disk read happens here.


PS9, Line 153: //
///


PS9, Line 177: the write
what write? does this mean "the writes" or "a write"?


Line 187:   InternalList<Page> pinned_pages_;
why do we need this list? when a page is pinned, the buffer mapping can't be 
changed by definition so the client can't really do anything. is it just for 
debugging?


Line 195:   InternalList<Page> in_flight_write_pages_;
this list also doesn't seem necessary. is it used for anything or just 
debugging? hmm i guess we might also need to keep track of all pages for 
cancellation?


Line 246:   /// destruction of a page that was accessed via a data structure 
that is not PageHandle.
this sentences seems to contradict the one earlier in this file about locks for 
Page * references.


PS9, Line 249: Always open if pinned. Closed only if page
             :   /// is evicted.
Should it now just say "Closed iff page is evicted. Open otherwise."?


PS9, Line 255: pending_destruction
maybe: destroy_after_write (or destroy_after_write)
for consistent naming and to make it clearer that this is to synchronize with 
the async write callback.


http://gerrit.cloudera.org:8080/#/c/5584/9/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

Line 44:   Reset();
okay, but why was this needed?


Line 155:   RETURN_IF_ERROR(AllocateBufferInternal(client, len, &buffer));
why don't we just use AllocateBuffer() here for lines 154,155,162?


PS9, Line 176: ///
//


PS9, Line 228: AllocateBufferInternal
is there a better name for that that would better explain why 
CleanPagesBeforeAllocation() happens here rather than in Internal?


PS9, Line 239: decrease
maybe 'delta'?


Line 242:     int64_t to_evict = decrease - len;
shouldn't this be len - decrease? is it missing test coverage?


Line 331:     FreeBufferInternal(&buffer);
it looks like FreeBufferInternal() adjusts buffer_bytes_remaining_... is that 
what we want given what this method is trying to do?


Line 333:   if (total_len < bytes_to_evict) {
in what case can this happen?


Line 469:     // 'repin_after_write'.
close parenthese.

But why don't we just do the list add here after WaitForWrite() returns? that 
needs to be atomic with respect to what?


Line 496:   // return on error.
why this comment? i don't see any place in this method where we don't just 
return on error...


Line 503:   BufferHandle buffer;
why don't we need to reserve here?


Line 533:   RETURN_IF_ERROR(write_status_); // Propagate error if we can't 
evict pages.
why do we need this check? when would write_status_ have been set?


http://gerrit.cloudera.org:8080/#/c/5584/9/be/src/runtime/bufferpool/buffer-pool.h
File be/src/runtime/bufferpool/buffer-pool.h:

PS9, Line 257: between lists is atomic
maybe clarify by saying: from a client list to the global clean page list is 
atomic

(I think the bit that's missing for this sentence to make sense to a reader 
that it's expected the caller will be moving the page from a client list).


PS9, Line 263: list is atomic and there is not a
             :   /// window so that moving the page between lists is atomic
garbled.


PS9, Line 271: if not enough bytes could be evicted
when can this happen? It should be guaranteed to succeed by the reservation, 
right?


PS9, Line 306: Client
I wonder if we should rename Client to ClientHandle and ClientImpl to Client, 
to have the same naming scheme of Buffer and Page.  What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/5584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c6119a4557da853fefa02e17c49d8be0e9fbe01
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to