Any feedback, please? On Thursday, November 22, 2018 at 1:07:16 PM UTC-5, Waldek Kozaczuk wrote: > > As I noted in the commit message this patch may be not very elegant but > seems to be pretty effective. Please suggest any other ideas of how to > implement it better. Please my self-review comments below. > > On Thursday, November 22, 2018 at 1:28:32 AM UTC-5, Waldek Kozaczuk wrote: >> >> This patch implements a naive but efficient scheme of allocating >> memory before SMP is enabled and regular malloc pools are setup. >> Before this patch every single pre-SMP allocation would >> be backed by full 4K page regardless if requested size >> was 8 or 2000 bytes. >> >> New pre-SMP allocation scheme is based on the fact that >> relatively few objects are allocated (~ 2,000) and rarely >> freed (< 20%) and most are very small (<50 bytes). >> The algorithm is as simple as finding next big enough aligned >> space in a 4K page of memory, marking its size and resetting >> offset to the next free area of the page otherwise allocating >> new 4K page. For details read comments in the code. >> >> Even though this scheme is very simplistic it is >> roughly 35 times more space efficient and allows >> us to save around 6M of memory. It was measured that the net sum of >> sizes (as is) of all requested allocations is typically around 125K >> and new scheme uses ~44 pages (176K) of memory to satisfy all requests. >> >> Fixes #270 >> >> Signed-off-by: Waldemar Kozaczuk <[email protected]> >> --- >> core/mempool.cc | 133 ++++++++++++++++++++++++++++++++++++++++- >> include/osv/mempool.hh | 5 ++ >> 2 files changed, 136 insertions(+), 2 deletions(-) >> >> diff --git a/core/mempool.cc b/core/mempool.cc >> index 23dc67da..8c23e916 100644 >> --- a/core/mempool.cc >> +++ b/core/mempool.cc >> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned >> cpu_id) >> trace_pool_free_same_cpu(this, object); >> >> page_header* header = to_header(obj); >> + if (!header->owner) { >> + return; >> + } >> > Not sure if that check and in other place is necessary but in case we get > here and it is early alloc page the code below will not execute that way as > it does not apply. > >> if (!--header->nalloc && have_full_pages()) { >> if (header->local_free) { >> _free->erase(_free->iterator_to(*header)); >> @@ -321,6 +324,9 @@ void pool::free(void* object) >> >> free_object* obj = static_cast<free_object*>(object); >> page_header* header = to_header(obj); >> + if (!header->owner) { >> + return; >> + } >> unsigned obj_cpu = header->cpu_id; >> unsigned cur_cpu = mempool_cpuid(); >> >> @@ -1432,6 +1438,113 @@ static void early_free_page(void* v) >> auto pr = new (v) page_range(page_size); >> free_page_range(pr); >> } >> +// >> +// Following variables and functions implement simple >> +// early (pre-SMP) memory allocation scheme. >> +mutex early_alloc_lock; >> +// early_object_pages holds a pointer to the beginning of the current >> page >> +// intended to be used for next early object allocation >> +static void* early_object_page = nullptr; >> +// early_alloc_next_offset points to the 0-relative address of free >> +// memory within a page pointed by early_object_page >> +static size_t early_alloc_next_offset = 0; >> + >> +static early_page_header* to_early_page_header(void* object) >> +{ >> + return reinterpret_cast<early_page_header*>( >> + reinterpret_cast<std::uintptr_t>(object) & ~(page_size - >> 1)); >> +} >> + >> +static void setup_early_alloc_page() { >> + early_object_page = early_alloc_page(); >> + early_page_header *page_header = >> to_early_page_header(early_object_page); >> + // Set the owner field to null so that functions that free objects >> + // or compute object size can differentiate between post-SMP malloc >> pool >> + // and early (pre-SMP) allocation >> + page_header->owner = nullptr; >> + page_header->allocations_count = 0; >> + early_alloc_next_offset = sizeof(early_page_header); >> +} >> + >> +// >> +// This functions implements simple but effective scheme >> +// of allocating objects before SMP is setup. It does so by >> +// remembering where within a page free memory starts. Then it >> +// calculates next closest offset matching specified alignment >> +// and verifies there is enough space until end of the current >> +// page to allocate from. If not it allocates next full page >> +// to find enough requested space. >> +static void* early_alloc_object(size_t size, size_t alignment) >> +{ >> > Not sure if the mutex is really necessary as in pre-SMP there is only > single thread (am I wrong?) > but better be safe than sorry. > >> + WITH_LOCK(early_alloc_lock) { >> + if (!early_object_page) { >> + setup_early_alloc_page(); >> + } >> + >> + size_t offset = (early_alloc_next_offset / alignment + 1) * >> alignment; >> + if (offset - early_alloc_next_offset < sizeof(unsigned short)) { >> // Make sure enough place for size >> + offset += alignment; //Assume alignment >= 2 >> + } >> + >> + if (offset + size > page_size) { >> + setup_early_alloc_page(); >> + offset = (early_alloc_next_offset / alignment + 1) * >> alignment; >> + } >> + >> + assert(offset - early_alloc_next_offset >= sizeof(unsigned >> short)); >> + assert(offset + size <= page_size); // Verify we have enough >> space to satisfy this allocation >> + >> + auto ret = early_object_page + offset; >> + early_alloc_next_offset = offset + size; >> + >> + // Save size of the allocated object 2 bytes before it address >> + *reinterpret_cast<unsigned short *>(ret - sizeof(unsigned >> short)) = >> + static_cast<unsigned short>(size); >> > The address of this 2 bytes field is effectively not 16-bytes aligned so I > wonder > if we should actually make this field be "ret - 16 bytes". > >> + to_early_page_header(early_object_page)->allocations_count++; >> + return ret; >> + } >> +} >> + >> +// >> +// This function fairly rarely actually frees previously >> +// allocated memory. It does so only if all objects >> +// have been freed in a page based on allocations_count or >> +// if the object being freed is the last one that was allocated. >> +static void early_free_object(void *object) >> +{ >> + WITH_LOCK(early_alloc_lock) { >> + early_page_header *page_header = to_early_page_header(object); >> + assert(!page_header->owner); >> + unsigned short *size_addr = reinterpret_cast<unsigned >> short*>(object - sizeof(unsigned short)); >> + unsigned short size = *size_addr; >> + if (!size) { >> + return; >> + } >> + >> + *size_addr = 0; // Reset size to 0 so that we know this object >> was freed and prevent from freeing again >> > Not sure if checking that object was already freed is really necessary. > >> + page_header->allocations_count--; >> + if (page_header->allocations_count <= 0) { // Any early page >> + early_free_page(page_header); >> + if (early_object_page == page_header) { >> + early_object_page = nullptr; >> + } >> + } >> + else if(early_object_page == page_header) { // Current early >> page >> + void *last_obj = static_cast<void *>(page_header) + >> (early_alloc_next_offset - size); >> + // Check if we are freeing last allocated object (free >> followed by malloc) >> + // and deallocate if so by moving the >> early_alloc_next_offset to the previous >> + // position >> + if (last_obj == object) { >> + early_alloc_next_offset -= size; >> + } >> + } >> + } >> +} >> + >> +static size_t early_object_size(void* v) >> +{ >> + return *reinterpret_cast<unsigned short*>(v - sizeof(unsigned >> short)); >> +} >> >> static void* untracked_alloc_page() >> { >> @@ -1559,6 +1672,10 @@ static inline void* std_malloc(size_t size, size_t >> alignment) >> ret = translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::mempool, >> ret); >> trace_memory_malloc_mempool(ret, size, 1 << n, alignment); > > With this patch and other one we have 2 more if branches. I wonder if > performance wise > we should re-order conditions in those ifs and check for smp_allocator > first. > >> >> + } else if (size < mmu::page_size && alignment < mmu::page_size && >> !smp_allocator) { >> + ret = memory::early_alloc_object(size, alignment); >> + ret = translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::mempool, >> + ret); >> } else if (minimum_size <= mmu::page_size && alignment <= >> mmu::page_size) { >> ret = mmu::translate_mem_area(mmu::mem_area::main, >> mmu::mem_area::page, >> memory::alloc_page()); >> @@ -1592,7 +1709,13 @@ static size_t object_size(void *object) >> case mmu::mem_area::mempool: >> object = mmu::translate_mem_area(mmu::mem_area::mempool, >> mmu::mem_area::main, object); >> - return memory::pool::from_object(object)->get_size(); >> + { >> + auto pool = memory::pool::from_object(object); >> + if (pool) >> + return pool->get_size(); >> + else >> + return memory::early_object_size(object); >> + } >> case mmu::mem_area::page: >> return mmu::page_size; >> case mmu::mem_area::debug: >> @@ -1639,7 +1762,13 @@ void free(void* object) >> case mmu::mem_area::mempool: >> object = mmu::translate_mem_area(mmu::mem_area::mempool, >> mmu::mem_area::main, object); >> - return memory::pool::from_object(object)->free(object); >> + { >> + auto pool = memory::pool::from_object(object); >> + if (pool) >> + return pool->free(object); >> + else >> + return memory::early_free_object(object); >> + } >> case mmu::mem_area::debug: >> return dbg::free(object); >> default: >> diff --git a/include/osv/mempool.hh b/include/osv/mempool.hh >> index 5dae7646..01bf9de2 100644 >> --- a/include/osv/mempool.hh >> +++ b/include/osv/mempool.hh >> @@ -36,6 +36,11 @@ void setup_free_memory(void* start, size_t bytes); >> >> namespace bi = boost::intrusive; >> >> +struct early_page_header { >> + void* owner; >> + unsigned short allocations_count; >> +}; >> + >> struct free_object { >> free_object* next; >> }; >> -- >> 2.19.1 >> >>
-- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
