Please see my answers below. As I can tell you do not have any major objections so I will make the changes and send a revised patch. I probably need to adjust some of my comments.
On Sunday, December 9, 2018 at 7:53:38 AM UTC-5, Nadav Har'El wrote: > > On Thu, Nov 22, 2018 at 8:28 AM Waldemar Kozaczuk <[email protected] > <javascript:>> wrote: > >> This patch implements a naive but efficient scheme of allocating >> memory before SMP is enabled and regular malloc pools are setup. >> Before this patch every single pre-SMP allocation would >> be backed by full 4K page regardless if requested size >> was 8 or 2000 bytes. >> >> New pre-SMP allocation scheme is based on the fact that >> relatively few objects are allocated (~ 2,000) and rarely >> freed (< 20%) and most are very small (<50 bytes). >> The algorithm is as simple as finding next big enough aligned >> space in a 4K page of memory, marking its size and resetting >> offset to the next free area of the page otherwise allocating >> new 4K page. For details read comments in the code. >> >> Even though this scheme is very simplistic it is >> roughly 35 times more space efficient and allows >> us to save around 6M of memory. It was measured that the net sum of >> sizes (as is) of all requested allocations is typically around 125K >> and new scheme uses ~44 pages (176K) of memory to satisfy all requests. >> >> Fixes #270 >> >> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>> >> --- >> core/mempool.cc | 133 ++++++++++++++++++++++++++++++++++++++++- >> include/osv/mempool.hh | 5 ++ >> 2 files changed, 136 insertions(+), 2 deletions(-) >> >> diff --git a/core/mempool.cc b/core/mempool.cc >> index 23dc67da..8c23e916 100644 >> --- a/core/mempool.cc >> +++ b/core/mempool.cc >> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned >> cpu_id) >> trace_pool_free_same_cpu(this, object); >> >> page_header* header = to_header(obj); >> + if (!header->owner) { >> + return; >> > > Would be nice to have a comment saying what is this case, i.e., this is an > early allocation that we simply ignore its free(). > However, I don't understand why this is the case - it seems that earlier > in the code, if the owner (pool::from_object()) is null, > you don't call this function, you call something else. So why bother with > this test here at all? Shouldn't we just crash if this > bug happens? > > You are right we should never get here. This is me being obsessively defensive - just in case we somehow get here. You saying I should remove this for clarity? > + } >> if (!--header->nalloc && have_full_pages()) { >> if (header->local_free) { >> _free->erase(_free->iterator_to(*header)); >> @@ -321,6 +324,9 @@ void pool::free(void* object) >> >> free_object* obj = static_cast<free_object*>(object); >> page_header* header = to_header(obj); >> + if (!header->owner) { >> + return; >> > > Ditto. > Same explanations as above. > > + } >> unsigned obj_cpu = header->cpu_id; >> unsigned cur_cpu = mempool_cpuid(); >> >> @@ -1432,6 +1438,113 @@ static void early_free_page(void* v) >> auto pr = new (v) page_range(page_size); >> free_page_range(pr); >> } >> +// >> +// Following variables and functions implement simple >> +// early (pre-SMP) memory allocation scheme. >> +mutex early_alloc_lock; >> +// early_object_pages holds a pointer to the beginning of the current >> page >> +// intended to be used for next early object allocation >> +static void* early_object_page = nullptr; >> +// early_alloc_next_offset points to the 0-relative address of free >> +// memory within a page pointed by early_object_page >> +static size_t early_alloc_next_offset = 0; >> + >> +static early_page_header* to_early_page_header(void* object) >> +{ >> + return reinterpret_cast<early_page_header*>( >> + reinterpret_cast<std::uintptr_t>(object) & ~(page_size - 1)); >> > > Looking at your code, it seems you are assuming that early_page_header and > page_header start with > the same "owner" field (with early allocations having 0 there, normal > allocations point to the pool), but > then have different content. Is that right? Maybe you should have a > comment about this next to > early_page_header, not just in setup_early_alloc_page()? > Right. Ideally we could have some sort of "data interface" construct specifying that both page_header and early_page_header start with same field. Maybe we can introduce this as a 1-field struct that both other struct share, or comment is enough? > > +} >> + >> +static void setup_early_alloc_page() { >> + early_object_page = early_alloc_page(); >> + early_page_header *page_header = >> to_early_page_header(early_object_page); >> + // Set the owner field to null so that functions that free objects >> + // or compute object size can differentiate between post-SMP malloc >> pool >> + // and early (pre-SMP) allocation >> + page_header->owner = nullptr; >> + page_header->allocations_count = 0; >> + early_alloc_next_offset = sizeof(early_page_header); >> +} >> + >> +// >> +// This functions implements simple but effective scheme >> +// of allocating objects before SMP is setup. It does so by >> +// remembering where within a page free memory starts. Then it >> +// calculates next closest offset matching specified alignment >> +// and verifies there is enough space until end of the current >> +// page to allocate from. If not it allocates next full page >> +// to find enough requested space. >> > +static void* early_alloc_object(size_t size, size_t alignment) >> +{ >> + WITH_LOCK(early_alloc_lock) { >> + if (!early_object_page) { >> + setup_early_alloc_page(); >> + } >> + >> + size_t offset = (early_alloc_next_offset / alignment + 1) * >> alignment; >> > + if (offset - early_alloc_next_offset < sizeof(unsigned short)) { >> // Make sure enough place for size >> > + offset += alignment; //Assume alignment >= 2 >> + } >> > > I think you can replace the above 4 lines of calculations, with one > clearer line: > > offset = align_up(early_alloc_next_offset + sizeof(short), > alignment); > > (but please check that I didn't misunderstand the calculations) > I am not 100% sure as my calculations make sure that there is 2 bytes left (sizeof(usigned short) between end of previous object and beginning of next one to to store size of of the object. Effectively each malloc'ed object is preceded by 2 bytes where the size is stored. Obviously the alignment requirement has to be satisfied as well. > > >> + >> + if (offset + size > page_size) { >> + setup_early_alloc_page(); >> > > Looking below, it seems you don't support large early allocations at all > (more on this below). > If you did, then a possible optimization here could be, if size is large > (e.g., bigger than a page), > there is no reason to throw away the half-allocated page. We can just make > the new allocation > in full new pages and keep using the old half-full page. > > >> + offset = (early_alloc_next_offset / alignment + 1) * >> alignment; >> + } >> + >> + assert(offset - early_alloc_next_offset >= sizeof(unsigned >> short)); >> > + assert(offset + size <= page_size); // Verify we have enough >> space to satisfy this allocation >> > > But if one tries to malloc size>4096, this will fail here, no? Why is this > not a problem? > Or is it just that we assume that early on we only have small allocations, > and this assertion > verifies it (if this is the case, please add a comment about this > assertion saying why we > make this assumption). > It is OK because this code is not meant to handle > 4096 allocations. If you look at the if statements in std_malloc() the >= 4096 early allocations should still be handled by malloc_large(). > > + >> + auto ret = early_object_page + offset; >> + early_alloc_next_offset = offset + size; >> + >> + // Save size of the allocated object 2 bytes before it address >> > + *reinterpret_cast<unsigned short *>(ret - sizeof(unsigned short)) >> = >> + static_cast<unsigned short>(size); >> + to_early_page_header(early_object_page)->allocations_count++; >> + return ret; >> + } >> +} >> + >> +// >> +// This function fairly rarely actually frees previously >> +// allocated memory. It does so only if all objects >> +// have been freed in a page based on allocations_count or >> +// if the object being freed is the last one that was allocated. >> > > You said we free only a minority of early memory allocated (20%) - > do we actually reach these specific cases? > Based on my debug statements it does. It I remember correctly It helps us save 8-10 pages total which is not that much considering we use less than 200K anyway. > > +static void early_free_object(void *object) >> +{ >> + WITH_LOCK(early_alloc_lock) { >> + early_page_header *page_header = to_early_page_header(object); >> + assert(!page_header->owner); >> + unsigned short *size_addr = reinterpret_cast<unsigned >> short*>(object - sizeof(unsigned short)); >> + unsigned short size = *size_addr; >> + if (!size) { >> + return; >> + } >> + >> + *size_addr = 0; // Reset size to 0 so that we know this object >> was freed and prevent from freeing again >> > > This is nice protection against double-free, but I don't think we have > such protection even for normal > allocations, so it's definitely not important for early allocation. Nobody > expects it to be fine to free() > and already freed object and free() will just silently do nothing. > Then again, if we anyway have this "size" thing (and it's sad we have > to... I'm not sure we *really* > need to implement object_size() correctly for early allocations...), I > guess we can use it to get > double-free protection for free... > If you think it makes unnecessarily more complicated and possibly prone to bugs I can remove it. > > + page_header->allocations_count--; >> + if (page_header->allocations_count <= 0) { // Any early page >> + early_free_page(page_header); >> + if (early_object_page == page_header) { >> + early_object_page = nullptr; >> + } >> + } >> + else if(early_object_page == page_header) { // Current early page >> + void *last_obj = static_cast<void *>(page_header) + >> (early_alloc_next_offset - size); >> + // Check if we are freeing last allocated object (free >> followed by malloc) >> + // and deallocate if so by moving the >> early_alloc_next_offset to the previous >> + // position >> + if (last_obj == object) { >> + early_alloc_next_offset -= size; >> > > Is it really -= size, or do you also need to undo the increase for > alignment, two-byte for size, etc.? > > I think I am fine because here I am only detecting if this is the case of free() being called on the last malloc()-ed object which means that early_alloc_next_offset has not yet been adjusted for next malloc(), right? And given that in early_alloc_object() I have this logic: auto ret = early_object_page + offset; early_alloc_next_offset = offset + size; subtracting size from current early_alloc_next_offset should give me offset of the last malloc-ed object in the current page. Is this wrong assumption? + } >> + } >> + } >> +} >> + >> +static size_t early_object_size(void* v) >> +{ >> + return *reinterpret_cast<unsigned short*>(v - sizeof(unsigned >> short)); >> +} >> >> static void* untracked_alloc_page() >> { >> @@ -1559,6 +1672,10 @@ static inline void* std_malloc(size_t size, size_t >> alignment) >> ret = translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::mempool, >> ret); >> trace_memory_malloc_mempool(ret, size, 1 << n, alignment); >> + } else if (size < mmu::page_size && alignment < mmu::page_size && >> !smp_allocator) { >> + ret = memory::early_alloc_object(size, alignment); >> + ret = translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::mempool, >> + ret); >> } else if (minimum_size <= mmu::page_size && alignment <= >> mmu::page_size) { >> ret = mmu::translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::page, >> memory::alloc_page()); >> @@ -1592,7 +1709,13 @@ static size_t object_size(void *object) >> case mmu::mem_area::mempool: >> object = mmu::translate_mem_area(mmu::mem_area::mempool, >> mmu::mem_area::main, object); >> - return memory::pool::from_object(object)->get_size(); >> + { >> + auto pool = memory::pool::from_object(object); >> + if (pool) >> + return pool->get_size(); >> + else >> + return memory::early_object_size(object); >> > + } >> case mmu::mem_area::page: >> return mmu::page_size; >> case mmu::mem_area::debug: >> @@ -1639,7 +1762,13 @@ void free(void* object) >> case mmu::mem_area::mempool: >> object = mmu::translate_mem_area(mmu::mem_area::mempool, >> mmu::mem_area::main, object); >> - return memory::pool::from_object(object)->free(object); >> + { >> + auto pool = memory::pool::from_object(object); >> + if (pool) >> + return pool->free(object); >> + else >> + return memory::early_free_object(object); >> > + } >> case mmu::mem_area::debug: >> return dbg::free(object); >> default: >> diff --git a/include/osv/mempool.hh b/include/osv/mempool.hh >> index 5dae7646..01bf9de2 100644 >> --- a/include/osv/mempool.hh >> +++ b/include/osv/mempool.hh >> @@ -36,6 +36,11 @@ void setup_free_memory(void* start, size_t bytes); >> >> namespace bi = boost::intrusive; >> >> +struct early_page_header { >> + void* owner; >> + unsigned short allocations_count; >> +}; >> + >> struct free_object { >> free_object* next; >> }; >> -- >> 2.19.1 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
