Dan Hecht has posted comments on this change.

Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool
......................................................................


Patch Set 15:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6414/14/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

Line 347:     lock_guard<SpinLock> pl(page->buffer_lock);
> 'client_lock' would guard against the page being moved to a different arena
Yup, realized that after writing this comment. oops.


http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.cc
File be/src/runtime/bufferpool/buffer-allocator.cc:

PS15, Line 59: .
should be more explicit that it might return less memory than 'target_bytes'. 
Maybe say "Free as much memory as possible up to 'target_bytes' from this area 
back to the system allocator.


PS15, Line 64: returns the total amount freed 
hmm, it's kinda confusing that the return value has this subtle difference in 
meaning depending on claim_memory. 

It seems the most intuitive return value would be "Returns the number of bytes 
of system memory claimed for the caller. Memory returned to the system in 
excess of that amount is added into 'system_bytes_remaining_'.

Can the claim_memory=false path accommodate something like that? If we can't 
reconcile it to have a single return value, then it seems we really have two 
different abstractions bunched into one call.


PS15, Line 70: FreeMemory
do you think it makes sense to call this FreeSystemMemory() to make it clearer 
that this is about pushing memory back to the system?


PS15, Line 254: ffinally
typo


Line 261:       if (delta == len) break;
since we can't get rid of the other DecreaseSystemBytesRemaining() probably 
better to move this back to where it was.


PS15, Line 263: lock_arenas
thought this would become 'final_attempt'


Line 304:   vector<std::unique_lock<SpinLock>> arena_locks;
let's add a comment explaining all of this. Something like:

There are two strategies for scavenging buffers:
1) Fast, opportunistic: Each arena is searched in succession. Although 
reservations guarantee that the memory we need is available somewhere, this can 
fail because we can race with another thread that returned buffers to an arena 
that we've already searched and took the buffers from an arena we haven't yet 
searched.

2) Slow, guaranteed to succeed: In order to ensure that we can find the memory 
in a single pass, we hold on to the locks for arenas we've already examined. 
That way, if another thread can't take the memory that we need from an arena 
that we haven't yet examined (or from 'system_bytes_available_') because in 
order to do so, it would have had to return the equivalent amount of memory to 
an earlier arena or added it back into 'systems_bytes_reamining_'. The former 
can't happen since we're still holding those locks, and the latter is solved by 
trying DecreaseSystemBytesRemaining() at the end.


PS15, Line 317:   // It's possible that 'system_bytes_remaining_' was 
incremented since we last checked
              :   // it.
maybe this sentence would now be explained in the proposed comment above.


PS15, Line 319: another thread between when that thread released memory to the 
system and
              :   // incremented 'system_bytes_remaining_'.
I think it'd be good to explicitly say:
... since that happens while holding one of the arena locks.
or similar. though the exception is when AllocateInternal() we add it back in 
without holding a lock, which isn't a correctness problem since that memory 
couldn't be needed by another client, but makes the statement a little 
confusing.


PS15, Line 322: false
why false instead of true? I guess it doesn't matter since it should succeed to 
get all of it, but 'true' seems more intuitive to me.


http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.h
File be/src/runtime/bufferpool/buffer-allocator.h:

Line 47: /// required memory from the OS).
I think explaining a couple of more things would help:

1) The fact that BufferAllocator relies on the reservations (which are managed 
by the BufferPool & client) and the BufferPool's eviction policy, for 
correctness, to ensure that a client that has unused reservation can call 
Allocate() and the BufferAllocator() will always be able to find the memory to 
back the buffer somewhere.

2) Some implementation notes explaining that available memory is sort of kept 
in three forms: buffers in the free lists, buffers in the clean pages lists, 
and available memory that can be called up from the system. This information is 
kind of in the above paragraphs, but I think explaining it in that way will 
help understand the implementation. Probably paragraph two above can be moved 
into this section since the class abstraction itself doesn't expose arenas, 
right?


PS15, Line 91: void
does the caller care if it couldn't release 'bytes_to_free' bytes?


PS15, Line 133: buffers
memory? since some of it might come from 'system_bytes_remaining_'.


PS15, Line 135: lock_arenas
one thing that was confusing is that the name of this makes it sound like 
arenas are not locked when lock_arenas=false. But that's not true -- it's just 
they aren't concurrently locked.

As mentioned in the cc file, I don't think the caller actually cares how the 
arena locking works, right? What this abstraction is really trying to expose is 
a "fast opportunistic path that might not succeed", and a "slow but guaranteed 
to succeed path". So, how about abstracting it and explaining it that way?


-- 
To view, visit http://gerrit.cloudera.org:8080/6414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I612bd1cd0f0e87f7d8186e5bedd53a22f2d80832
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to