Tim Armstrong has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation ......................................................................
Patch Set 6: (27 comments) http://gerrit.cloudera.org:8080/#/c/6410/6/be/src/benchmarks/free-lists-benchmark.cc File be/src/benchmarks/free-lists-benchmark.cc: PS6, Line 48: per thread > It might be easier to read if it showed total throughput Did this. Agree it's probably a bit easier to interpret. Had to tweak the benchmark setup a bit as a consequence so that it actually ran enough iterations of the benchmark (sometimes it ran 0). Also found a couple of optimisation opportunites to speed up the benchmark. Line 58: // > what conclusion should we draw from this benchmark if we end up memory boun I wrote up a short summary of the results and conclusions here - it's hard to know what to make of it otherwise. PS6, Line 59: i7-4790 > Do you have Hyperthreading on? Yes PS6, Line 61: 0 iters > It might help to explain what this is before these benchmarks it's mentioned on l 57 but I made it clearer. Line 257: > elide this empty line Done PS6, Line 288: MAX_LIST_ENTRIES > How was this chosen? Added comment. PS6, Line 303: * > & Done Line 308: lock_guard<SpinLock> l(free_list->first); > In the case where there is one list per thread, why bother with the lock? I wanted to measure throughput including the uncontended lock acquisition - we'd still need the lock in practice since other threads will be able to access the free lists of other threads (either because they're scheduled on the same core or because they are trying to steal a buffer from a different core). Line 350: int64_t op = uniform_int_distribution<int64_t>(0, 1)(rng); > const Done PS6, Line 373: new pair<SpinLock, FreeList> I realised these could end up on the same cache line - I think this may have affected some results. Line 405: for (int num_threads : {1, 2, 4, CpuInfo::num_cores()}) { > Is it worth trying 2*num_cores? It's probably not very interesting since num_cores already includes the hyperthreads. http://gerrit.cloudera.org:8080/#/c/6410/6/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: PS6, Line 37: BufferPool::BufferHandle buffer > how does the caller know it must do a move() here? i.e. what prevents the c BufferHandle isn't copyable so it's impossible to pass in a value without move()ing it. I'm not that familiar with the best practices and trade-offs with passing rvalue references versus values but it seems to work either way and the rvalue ref makes it more explicit, so I changed it. http://gerrit.cloudera.org:8080/#/c/6410/6/be/src/runtime/bufferpool/free-list-test.cc File be/src/runtime/bufferpool/free-list-test.cc: PS6, Line 18: c > also use <cstdint> in free-list.h or use <stdlib.h> here Done PS6, Line 50: vector<BufferHandle>* buffers > You can just return this out parameter and the [N]RVO will make it efficien Unfortunately the ASSERT macros only work in functions returning void. PS6, Line 58: void* > const void* Done PS6, Line 103: various numbers > Only LIST_SIZE; no need for a loop I missed changing the loop bounds to test various sizes. Fixed that. Line 112: vector<void*> addrs = GetSortedAddrs(buffers); > const vector... Done PS6, Line 118: small_list.Size() - LIST_SIZE > always 0? I intended to run for different values of num_buffers but didn't do it - fixed. PS6, Line 120: two > LIST_SIZE Done http://gerrit.cloudera.org:8080/#/c/6410/6/be/src/runtime/bufferpool/free-list.h File be/src/runtime/bufferpool/free-list.h: PS6, Line 50: This helps to consolidate buffers within the address space > Sorry, I don't think I understand this - how does it help consolidate buffe I elaborated a bit on the problem - agree this was cryptic. PS6, Line 60: bool > Can you just return nullptr on failure instead? We'd be returning a BufferHandle, not a BufferHandle*. It could return a closed BufferHandle to indicate failure but I feel like that makes it harder to structure the logic in the caller. E.g. currently you can do this. BufferHandle handle; if (list1.GetFreeBuffer(&handle)) { } else if (list2.GetFreeBuffer(&handle)) { } But if we changed it, it would look more like: BufferHandle handle = list1.GetFreeBuffer(); if (handle.is_open()) { } else { handle = list2.GetFreeBuffer(); if (handle2.is_open()) { } } or BufferHandle handle; if ((handle = list1.GetFreeBuffer()).is_open()) { } else if ((handle = list2.GetFreeBuffer()).is_open()) { } both of which seem clunkier PS6, Line 60: Get > Getters are often const methods. Maybe 'PopFreeBuffer'? Works for me. Line 69: void AddFreeBuffer(BufferHandle buffer) { > If this will always be a moved argument, you can give it type BufferHandle& Done PS6, Line 81: int > free_list_.size() is likely stored as a 64-bit integer type and is below re In practice storing over 2 billion entries would be crazy but might as well be consistent with the types. PS6, Line 84: heap > minheap Done PS6, Line 115: Limited to 'max_capacity_' capacity > Obsolete. Done Line 117: std::vector<BufferHandle> free_list_; > Since you need a double-ended priority queue, did you consider using a std: The constant overheads of the balanced binary tree are pretty bad - each node has a lot of memory overhead, the memory isn't contiguous, and it calls malloc() or free() for most operations. On a previous project I replaced an rbtree priority queue with a binary heap and saw some significant improvements. -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
