Tim Armstrong has posted comments on this change. Change subject: IMPALA-3200: Implement suballocator for splitting buffers ......................................................................
Patch Set 4: (34 comments) http://gerrit.cloudera.org:8080/#/c/4715/4//COMMIT_MSG Commit Message: PS4, Line 13: buddy allocation > Buddy allocators tend to have high internal fragmentation.Did you consider The main use case is to allocate power-of-two buffers, so currently this isn't a problem. Maybe there are some other use cases down the line but for now it seems better to go with a simple algorithm. http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator-test.cc File be/src/bufferpool/suballocator-test.cc: Line 28: > why the gap? Done Line 44: virtual void SetUp() { > override, here and below Done PS4, Line 63: != NULL > elide I think this is a matter of taste, but we've generally standardised on explicit null checks. PS4, Line 68: Random > 'Randomness' or 'PRNG'; the seed itself is not random Wikipedia knows what I mean :) https://en.wikipedia.org/wiki/Random_seed Line 75: void VerifyMemoryValid(const vector<unique_ptr<Suballocation>>& allocs); > static Done Line 78: void FreeAllocations( > static Done PS4, Line 80: unique_ptr<Suballocation> > auto Done PS4, Line 88: default_random_engine > It might be good to pick one for repeatability reasons Done http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.cc File be/src/bufferpool/suballocator.cc: Line 40: free_lists_[i] = NULL; > This is the default. If you elide this loop, this should happen anyway. Done Line 59: const int target_list_idx = Bits::Log2CeilingNonZero64(bytes); > How do you know bytes is non-zero? This is now handled with the minimum allocation size logic. Line 60: for (int i = target_list_idx; i < ALLOCATION_SIZES; ++i) { > This can be O(1) with ctz and a bitmap for the freelist emptiness It's O(1) now just with a higher constant factor assuming an upper bound on allocation sizes. I'd say we should defer this unless we actually see it become a bottleneck. Line 88: int64_t buffer_len = max(min_buffer_len_, BitUtil::RoundUpToPowerOfTwo(bytes)); > const Done Line 127: if (!status.ok()) { > if (!Suballocation::Create(&nodes[i]).ok()) {... Done Line 144: int64_t child_len = free_node->len_ / 2; > const Done PS4, Line 233: != NULL > elide We wouldn't normally elide it. http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.h File be/src/bufferpool/suballocator.h: Line 31: enum class ExpandReservations { > ExpansionStrategy, maybe? Done Line 37: }; > This can live inside Suballocator, right? It can but we end up with Suballocator::ExpansionStrategy everywhere, which is pretty verbose. PS4, Line 39: buddy > There are a few different variants of buddy systems. It might be good to sp Elaborated slightly in this comment. PS4, Line 43: relatively large allocations > Yet free_lists_ goes all the way down to one byte? Why not enforce this at I think my main concern there would be that the memory overhead of the allocations isn't tracked, so could overwhelm the actual tracked memory. I added logic to enforce a minimum allocation size and round up allocations to that amount (DCHECKing or similar seems a little awkward). Line 50: ExpandReservations expand_reservations); > Can you annotate these params in a comment Done Line 54: /// Allocate bytes from BufferPool. The allocation is NULL if unsuccessful because > nullptr, here and elsewhere Done Line 71: /// Allocate a buffer of size 'bytes' from the buffer pool and initialize 'result' > bytes must be less than MAX_ALLOCATION_BYTES Done PS4, Line 79: Can fail if malloc() fails > if new returns nullptr Reworded to somethign more generic about memory allocation failing. PS4, Line 109: by > 'but' Done PS4, Line 127: must > or else leaked memory or some kind of segfault or nasal demons? Done PS4, Line 132: Destructor > Elide Done Line 138: private: > also disallow_copy_and_assign? Done Line 152: /// The buffer backing the suballocation, if the suballocation is backed by an entire > capitalization, here and elsewhere: Suballocation and Allocation Not sure why we'd capitalise allocation since it's not a class. Line 153: /// buffer. Otherwise uninitialized. 'buffer_' is open iff buddy is NULL. > buddy_ Done Line 156: /// If this is a left child, the parent of this and its buddy. The parent's allocation > It would be helpful to explain the tree structure first. Added some explanation to the class comment. Line 159: std::unique_ptr<Suballocation> parent_; > Can the parent be a buddy, thus being pointed to by its buddy and one of it Yes. I think the explanation in the class comment should cover it. Line 171: Suballocation* prev_free_; > so next_free_ is not really unique? I'm troubled by this design pattern. Di unique_ptr to me means unique ownership, which doesn't preclude a raw pointer existing, so long as the raw pointer doesn't outlive the unique_ptr. weak_ptr/shared_ptr only seems appropriate when there are multiple owners and the lifetime of those owning pointers may be shorter that weak_ptr. http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/common/names.h File be/src/common/names.h: Line 124: using std::move; > This might be overkill. I'd like 'move' to be available as a variable name It seems consistent with the other cases here (e.g. I can imagine 'left' being a legitimate variable name). I feel like within .cc files in the codebase we should mostly avoid using variable/function names that clash with commonly used standard library function names. I can remove if you feel strongly. -- To view, visit http://gerrit.cloudera.org:8080/4715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
