Re: [PATCH] Implement more space-efficient early memory allocation scheme

Waldek Kozaczuk Tue, 27 Nov 2018 04:36:37 -0800

Any feedback, please?

On Thursday, November 22, 2018 at 1:07:16 PM UTC-5, Waldek Kozaczuk wrote:
>
> As I noted in the commit message this patch may be not very elegant but 
> seems to be pretty effective. Please suggest any other ideas of how to 
> implement it better. Please my self-review comments below.
>
> On Thursday, November 22, 2018 at 1:28:32 AM UTC-5, Waldek Kozaczuk wrote:
>>
>> This patch implements a naive but efficient scheme of allocating 
>> memory before SMP is enabled and regular malloc pools are setup. 
>> Before this patch every single pre-SMP allocation would 
>> be backed by full 4K page regardless if requested size 
>> was 8 or 2000 bytes. 
>>
>> New pre-SMP allocation scheme is based on the fact that 
>> relatively few objects are allocated (~ 2,000) and rarely 
>> freed (< 20%) and most are very small (<50 bytes). 
>> The algorithm is as simple as finding next big enough aligned 
>> space in a 4K page of memory, marking its size and resetting 
>> offset to the next free area of the page otherwise allocating 
>> new 4K page. For details read comments in the code. 
>>
>> Even though this scheme is very simplistic it is 
>> roughly 35 times more space efficient and allows 
>> us to save around 6M of memory. It was measured that the net sum of 
>> sizes (as is) of all requested allocations is typically around 125K 
>> and new scheme uses ~44 pages (176K) of memory to satisfy all requests. 
>>
>> Fixes #270 
>>
>> Signed-off-by: Waldemar Kozaczuk <[email protected]> 
>> --- 
>>  core/mempool.cc        | 133 ++++++++++++++++++++++++++++++++++++++++- 
>>  include/osv/mempool.hh |   5 ++ 
>>  2 files changed, 136 insertions(+), 2 deletions(-) 
>>
>> diff --git a/core/mempool.cc b/core/mempool.cc 
>> index 23dc67da..8c23e916 100644 
>> --- a/core/mempool.cc 
>> +++ b/core/mempool.cc 
>> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned 
>> cpu_id) 
>>      trace_pool_free_same_cpu(this, object); 
>>   
>>      page_header* header = to_header(obj); 
>> +    if (!header->owner) { 
>> +        return; 
>> +    } 
>>
> Not sure if that check and in other place is necessary but in case we get 
> here and it is early alloc page the code below will not execute that way as 
> it does not apply. 
>
>>      if (!--header->nalloc && have_full_pages()) { 
>>          if (header->local_free) { 
>>              _free->erase(_free->iterator_to(*header)); 
>> @@ -321,6 +324,9 @@ void pool::free(void* object) 
>>   
>>          free_object* obj = static_cast<free_object*>(object); 
>>          page_header* header = to_header(obj); 
>> +        if (!header->owner) { 
>> +            return; 
>> +        } 
>>          unsigned obj_cpu = header->cpu_id; 
>>          unsigned cur_cpu = mempool_cpuid(); 
>>   
>> @@ -1432,6 +1438,113 @@ static void early_free_page(void* v) 
>>      auto pr = new (v) page_range(page_size); 
>>      free_page_range(pr); 
>>  } 
>> +// 
>> +// Following variables and functions implement simple 
>> +// early (pre-SMP) memory allocation scheme. 
>> +mutex early_alloc_lock; 
>> +// early_object_pages holds a pointer to the beginning of the current 
>> page 
>> +// intended to be used for next early object allocation 
>> +static void* early_object_page = nullptr; 
>> +// early_alloc_next_offset points to the 0-relative address of free 
>> +// memory within a page pointed by early_object_page 
>> +static size_t early_alloc_next_offset = 0; 
>> + 
>> +static early_page_header* to_early_page_header(void* object) 
>> +{ 
>> +    return reinterpret_cast<early_page_header*>( 
>> +            reinterpret_cast<std::uintptr_t>(object) & ~(page_size - 
>> 1)); 
>> +} 
>> + 
>> +static void setup_early_alloc_page() { 
>> +    early_object_page = early_alloc_page(); 
>> +    early_page_header *page_header = 
>> to_early_page_header(early_object_page); 
>> +    // Set the owner field to null so that functions that free objects 
>> +    // or compute object size can differentiate between post-SMP malloc 
>> pool 
>> +    // and early (pre-SMP) allocation 
>> +    page_header->owner = nullptr; 
>> +    page_header->allocations_count = 0; 
>> +    early_alloc_next_offset = sizeof(early_page_header); 
>> +} 
>> + 
>> +// 
>> +// This functions implements simple but effective scheme 
>> +// of allocating objects before SMP is setup. It does so by 
>> +// remembering where within a page free memory starts. Then it 
>> +// calculates next closest offset matching specified alignment 
>> +// and verifies there is enough space until end of the current 
>> +// page to allocate from. If not it allocates next full page 
>> +// to find enough requested space. 
>> +static void* early_alloc_object(size_t size, size_t alignment) 
>> +{ 
>>
> Not sure if the mutex is really necessary as in pre-SMP there is only 
> single thread (am I wrong?)
> but better be safe than sorry. 
>
>> +    WITH_LOCK(early_alloc_lock) { 
>> +        if (!early_object_page) { 
>> +            setup_early_alloc_page(); 
>> +        } 
>> + 
>> +        size_t offset = (early_alloc_next_offset / alignment + 1) * 
>> alignment; 
>> +        if (offset - early_alloc_next_offset < sizeof(unsigned short)) { 
>> // Make sure enough place for size 
>> +            offset += alignment; //Assume alignment >= 2 
>> +        } 
>> + 
>> +        if (offset + size > page_size) { 
>> +            setup_early_alloc_page(); 
>> +            offset = (early_alloc_next_offset / alignment + 1) * 
>> alignment; 
>> +        } 
>> + 
>> +        assert(offset - early_alloc_next_offset >= sizeof(unsigned 
>> short)); 
>> +        assert(offset + size <= page_size); // Verify we have enough 
>> space to satisfy this allocation 
>> + 
>> +        auto ret = early_object_page + offset; 
>> +        early_alloc_next_offset = offset + size; 
>> + 
>> +        // Save size of the allocated object 2 bytes before it address 
>> +        *reinterpret_cast<unsigned short *>(ret - sizeof(unsigned 
>> short)) = 
>> +                static_cast<unsigned short>(size); 
>>
> The address of this 2 bytes field is effectively not 16-bytes aligned so I 
> wonder
> if we should actually make this field be "ret - 16 bytes". 
>
>> +        to_early_page_header(early_object_page)->allocations_count++; 
>> +        return ret; 
>> +    } 
>> +} 
>> + 
>> +// 
>> +// This function fairly rarely actually frees previously 
>> +// allocated memory. It does so only if all objects 
>> +// have been freed in a page based on allocations_count or 
>> +// if the object being freed is the last one that was allocated. 
>> +static void early_free_object(void *object) 
>> +{ 
>> +    WITH_LOCK(early_alloc_lock) { 
>> +        early_page_header *page_header = to_early_page_header(object); 
>> +        assert(!page_header->owner); 
>> +        unsigned short *size_addr = reinterpret_cast<unsigned 
>> short*>(object - sizeof(unsigned short)); 
>> +        unsigned short size = *size_addr; 
>> +        if (!size) { 
>> +            return; 
>> +        } 
>> + 
>> +        *size_addr = 0; // Reset size to 0 so that we know this object 
>> was freed and prevent from freeing again 
>>
> Not sure if checking that object was already freed is really necessary. 
>
>> +        page_header->allocations_count--; 
>> +        if (page_header->allocations_count <= 0) { // Any early page 
>> +            early_free_page(page_header); 
>> +            if (early_object_page == page_header) { 
>> +                early_object_page = nullptr; 
>> +            } 
>> +        } 
>> +        else if(early_object_page == page_header) { // Current early 
>> page 
>> +            void *last_obj = static_cast<void *>(page_header) + 
>> (early_alloc_next_offset - size); 
>> +            // Check if we are freeing last allocated object (free 
>> followed by malloc) 
>> +            // and deallocate if so by moving the 
>> early_alloc_next_offset to the previous 
>> +            // position 
>> +            if (last_obj == object) { 
>> +                early_alloc_next_offset -= size; 
>> +            } 
>> +        } 
>> +    } 
>> +} 
>> + 
>> +static size_t early_object_size(void* v) 
>> +{ 
>> +    return *reinterpret_cast<unsigned short*>(v - sizeof(unsigned 
>> short)); 
>> +} 
>>   
>>  static void* untracked_alloc_page() 
>>  { 
>> @@ -1559,6 +1672,10 @@ static inline void* std_malloc(size_t size, size_t 
>> alignment) 
>>          ret = translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::mempool, 
>>                                   ret); 
>>          trace_memory_malloc_mempool(ret, size, 1 << n, alignment);
>
> With this patch and other one we have 2 more if branches. I wonder if 
> performance wise 
> we should re-order conditions in those ifs and check for  smp_allocator 
> first.
>
>>
>> +    } else if (size < mmu::page_size && alignment < mmu::page_size && 
>> !smp_allocator) { 
>> +        ret = memory::early_alloc_object(size, alignment); 
>> +        ret = translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::mempool, 
>> +                                 ret); 
>>      } else if (minimum_size <= mmu::page_size && alignment <= 
>> mmu::page_size) { 
>>          ret = mmu::translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::page, 
>>                                         memory::alloc_page()); 
>> @@ -1592,7 +1709,13 @@ static size_t object_size(void *object) 
>>      case mmu::mem_area::mempool: 
>>          object = mmu::translate_mem_area(mmu::mem_area::mempool, 
>>                                           mmu::mem_area::main, object); 
>> -        return memory::pool::from_object(object)->get_size(); 
>> +        { 
>> +            auto pool = memory::pool::from_object(object); 
>> +            if (pool) 
>> +                return pool->get_size(); 
>> +            else 
>> +                return memory::early_object_size(object); 
>> +        } 
>>      case mmu::mem_area::page: 
>>          return mmu::page_size; 
>>      case mmu::mem_area::debug: 
>> @@ -1639,7 +1762,13 @@ void free(void* object) 
>>      case mmu::mem_area::mempool: 
>>          object = mmu::translate_mem_area(mmu::mem_area::mempool, 
>>                                           mmu::mem_area::main, object); 
>> -        return memory::pool::from_object(object)->free(object); 
>> +        { 
>> +            auto pool = memory::pool::from_object(object); 
>> +            if (pool) 
>> +                return pool->free(object); 
>> +            else 
>> +                return memory::early_free_object(object); 
>> +        } 
>>      case mmu::mem_area::debug: 
>>          return dbg::free(object); 
>>      default: 
>> diff --git a/include/osv/mempool.hh b/include/osv/mempool.hh 
>> index 5dae7646..01bf9de2 100644 
>> --- a/include/osv/mempool.hh 
>> +++ b/include/osv/mempool.hh 
>> @@ -36,6 +36,11 @@ void setup_free_memory(void* start, size_t bytes); 
>>   
>>  namespace bi = boost::intrusive; 
>>   
>> +struct early_page_header { 
>> +    void* owner; 
>> +    unsigned short allocations_count; 
>> +}; 
>> + 
>>  struct free_object { 
>>      free_object* next; 
>>  }; 
>> -- 
>> 2.19.1 
>>
>>


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH] Implement more space-efficient early memory allocation scheme

Reply via email to