Re: [PATCH] Implement more space-efficient early memory allocation scheme

Waldek Kozaczuk Sun, 09 Dec 2018 12:42:15 -0800

Please see my answers below. As I can tell you do not have any major 
objections so I will make the changes and send a revised patch. I probably 
need to adjust some of my comments.


On Sunday, December 9, 2018 at 7:53:38 AM UTC-5, Nadav Har'El wrote:
>
> On Thu, Nov 22, 2018 at 8:28 AM Waldemar Kozaczuk <[email protected] 
> <javascript:>> wrote:
>
>> This patch implements a naive but efficient scheme of allocating
>> memory before SMP is enabled and regular malloc pools are setup.
>> Before this patch every single pre-SMP allocation would
>> be backed by full 4K page regardless if requested size
>> was 8 or 2000 bytes.
>>
>> New pre-SMP allocation scheme is based on the fact that 
>> relatively few objects are allocated (~ 2,000) and rarely
>> freed (< 20%) and most are very small (<50 bytes).
>> The algorithm is as simple as finding next big enough aligned
>> space in a 4K page of memory, marking its size and resetting
>> offset to the next free area of the page otherwise allocating
>> new 4K page. For details read comments in the code.
>>
>> Even though this scheme is very simplistic it is
>> roughly 35 times more space efficient and allows
>> us to save around 6M of memory. It was measured that the net sum of
>> sizes (as is) of all requested allocations is typically around 125K
>> and new scheme uses ~44 pages (176K) of memory to satisfy all requests.
>>
>> Fixes #270
>>
>> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>>
>> ---
>>  core/mempool.cc        | 133 ++++++++++++++++++++++++++++++++++++++++-
>>  include/osv/mempool.hh |   5 ++
>>  2 files changed, 136 insertions(+), 2 deletions(-)
>>
>> diff --git a/core/mempool.cc b/core/mempool.cc
>> index 23dc67da..8c23e916 100644
>> --- a/core/mempool.cc
>> +++ b/core/mempool.cc
>> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned 
>> cpu_id)
>>      trace_pool_free_same_cpu(this, object);
>>
>>      page_header* header = to_header(obj);
>> +    if (!header->owner) {
>> +        return;
>>
>
> Would be nice to have a comment saying what is this case, i.e., this is an 
> early allocation that we simply ignore its free().
> However, I don't understand why this is the case - it seems that earlier 
> in the code, if the owner (pool::from_object()) is null,
> you don't call this function, you call something else. So why bother with 
> this test here at all? Shouldn't we just crash if this
> bug happens?
>
> You are right we should never get here. This is me being obsessively 
defensive - just in case we somehow get here. You saying I should remove 
this for clarity?  

> +    }
>>      if (!--header->nalloc && have_full_pages()) {
>>          if (header->local_free) {
>>              _free->erase(_free->iterator_to(*header));
>> @@ -321,6 +324,9 @@ void pool::free(void* object)
>>
>>          free_object* obj = static_cast<free_object*>(object);
>>          page_header* header = to_header(obj);
>> +        if (!header->owner) {
>> +            return;
>>
>
> Ditto.
>
Same explanations as above. 

>
> +        }
>>          unsigned obj_cpu = header->cpu_id;
>>          unsigned cur_cpu = mempool_cpuid();
>>
>> @@ -1432,6 +1438,113 @@ static void early_free_page(void* v)
>>      auto pr = new (v) page_range(page_size);
>>      free_page_range(pr);
>>  }
>> +//
>> +// Following variables and functions implement simple
>> +// early (pre-SMP) memory allocation scheme.
>> +mutex early_alloc_lock;
>> +// early_object_pages holds a pointer to the beginning of the current 
>> page
>> +// intended to be used for next early object allocation
>> +static void* early_object_page = nullptr;
>> +// early_alloc_next_offset points to the 0-relative address of free
>> +// memory within a page pointed by early_object_page
>> +static size_t early_alloc_next_offset = 0;
>> +
>> +static early_page_header* to_early_page_header(void* object)
>> +{
>> +    return reinterpret_cast<early_page_header*>(
>> +            reinterpret_cast<std::uintptr_t>(object) & ~(page_size - 1));
>>
>
> Looking at your code, it seems you are assuming that early_page_header and 
> page_header start with
> the same "owner" field (with early allocations having 0 there, normal 
> allocations point to the pool), but
> then have different content. Is that right? Maybe you should have a 
> comment about this next to
> early_page_header, not just in setup_early_alloc_page()?
>
Right. Ideally we could have some sort of "data interface" construct 
specifying that both page_header and early_page_header start with same 
field. Maybe we can introduce this as a 1-field struct that both other 
struct share, or comment is enough?

>
> +}
>> +
>> +static void setup_early_alloc_page() {
>> +    early_object_page = early_alloc_page();
>> +    early_page_header *page_header = 
>> to_early_page_header(early_object_page);
>> +    // Set the owner field to null so that functions that free objects
>> +    // or compute object size can differentiate between post-SMP malloc 
>> pool
>> +    // and early (pre-SMP) allocation
>> +    page_header->owner = nullptr;
>> +    page_header->allocations_count = 0;
>> +    early_alloc_next_offset = sizeof(early_page_header);
>> +}
>> +
>> +//
>> +// This functions implements simple but effective scheme
>> +// of allocating objects before SMP is setup. It does so by
>> +// remembering where within a page free memory starts. Then it
>> +// calculates next closest offset matching specified alignment
>> +// and verifies there is enough space until end of the current
>> +// page to allocate from. If not it allocates next full page
>> +// to find enough requested space.
>>
> +static void* early_alloc_object(size_t size, size_t alignment)
>> +{
>> +    WITH_LOCK(early_alloc_lock) {
>> +        if (!early_object_page) {
>> +            setup_early_alloc_page();
>> +        }
>> +
>> +        size_t offset = (early_alloc_next_offset / alignment + 1) * 
>> alignment;
>>
> +        if (offset - early_alloc_next_offset < sizeof(unsigned short)) { 
>> // Make sure enough place for size
>>
> +            offset += alignment; //Assume alignment >= 2
>> +        }
>>
>
> I think you can replace the above 4 lines of calculations, with one 
> clearer line:
>
>        offset = align_up(early_alloc_next_offset + sizeof(short), 
> alignment);
>
> (but please check that I didn't misunderstand the calculations)
>
I am not 100% sure as my calculations make sure that there is 2 bytes left 
(sizeof(usigned short) between end of previous object and beginning of next 
one to to store size of of the object. Effectively each malloc'ed object is 
preceded by 2 bytes where the size is stored. Obviously the alignment 
requirement has to be satisfied as well.

>  
>
>> +
>> +        if (offset + size > page_size) {
>> +            setup_early_alloc_page();
>>
>
> Looking below, it seems you don't support large early allocations at all 
> (more on this below).
> If you did, then a possible optimization here could be, if size is large 
> (e.g., bigger than a page),
> there is no reason to throw away the half-allocated page. We can just make 
> the new allocation
> in full new pages and keep using the old half-full page.
>  
>
>> +            offset = (early_alloc_next_offset / alignment + 1) * 
>> alignment;
>> +        }
>> +
>> +        assert(offset - early_alloc_next_offset >= sizeof(unsigned 
>> short));
>>
> +        assert(offset + size <= page_size); // Verify we have enough 
>> space to satisfy this allocation
>>
>
> But if one tries to malloc size>4096, this will fail here, no? Why is this 
> not a problem? 
> Or is it just that we assume that early on we only have small allocations, 
> and this assertion
> verifies it (if this is the case, please add a comment about this 
> assertion saying why we
> make this assumption).
>
It is OK because this code is not meant to handle > 4096 allocations. If 
you look at the if statements in std_malloc() the >= 4096 early allocations 
should still be handled by malloc_large(). 

>
> +
>> +        auto ret = early_object_page + offset;
>> +        early_alloc_next_offset = offset + size;
>> +
>> +        // Save size of the allocated object 2 bytes before it address
>>
> +        *reinterpret_cast<unsigned short *>(ret - sizeof(unsigned short)) 
>> =
>> +                static_cast<unsigned short>(size);
>> +        to_early_page_header(early_object_page)->allocations_count++;
>> +        return ret;
>> +    }
>> +}
>> +
>> +//
>> +// This function fairly rarely actually frees previously
>> +// allocated memory. It does so only if all objects
>> +// have been freed in a page based on allocations_count or
>> +// if the object being freed is the last one that was allocated.
>>
>
> You said we free only a minority of early memory allocated (20%) -
> do we actually reach these specific cases?
>

Based on my debug statements it does. It I remember correctly It helps us 
save 8-10 pages total which is not that much considering we use less than 
200K anyway. 

>
> +static void early_free_object(void *object)
>> +{
>> +    WITH_LOCK(early_alloc_lock) {
>> +        early_page_header *page_header = to_early_page_header(object);
>> +        assert(!page_header->owner);
>> +        unsigned short *size_addr = reinterpret_cast<unsigned 
>> short*>(object - sizeof(unsigned short));
>> +        unsigned short size = *size_addr;
>> +        if (!size) {
>> +            return;
>> +        }
>> +
>> +        *size_addr = 0; // Reset size to 0 so that we know this object 
>> was freed and prevent from freeing again
>>
>
> This is nice protection against double-free, but I don't think we have 
> such protection even for normal
> allocations, so it's definitely not important for early allocation. Nobody 
> expects it to be fine to free()
> and already freed object and free() will just silently do nothing.
> Then again, if we anyway have this "size" thing (and it's sad we have 
> to... I'm not sure we *really*
> need to implement object_size() correctly for early allocations...), I 
> guess we can use it to get
> double-free protection for free...
>
If you think it makes unnecessarily more complicated and possibly prone to 
bugs I can remove it. 

>
> +        page_header->allocations_count--;
>> +        if (page_header->allocations_count <= 0) { // Any early page
>> +            early_free_page(page_header);
>> +            if (early_object_page == page_header) {
>> +                early_object_page = nullptr;
>> +            }
>> +        }
>> +        else if(early_object_page == page_header) { // Current early page
>> +            void *last_obj = static_cast<void *>(page_header) + 
>> (early_alloc_next_offset - size);
>> +            // Check if we are freeing last allocated object (free 
>> followed by malloc)
>> +            // and deallocate if so by moving the 
>> early_alloc_next_offset to the previous
>> +            // position
>> +            if (last_obj == object) {
>> +                early_alloc_next_offset -= size;
>>
>
> Is it really -= size, or do you also need to undo the increase for 
> alignment, two-byte for size, etc.?
>  
>
I think I am fine because here I am only detecting if this is the case of 
free() being called on the last malloc()-ed object which means that 
early_alloc_next_offset has not yet been adjusted for next malloc(), right? 
And given that in early_alloc_object() I have this logic:
auto ret = early_object_page + offset; 
early_alloc_next_offset = offset + size; 
subtracting size from current early_alloc_next_offset should give me offset 
of the last malloc-ed object in the current page. Is this wrong assumption? 

+            }
>> +        }
>> +    }
>> +}
>> +
>> +static size_t early_object_size(void* v)
>> +{
>> +    return *reinterpret_cast<unsigned short*>(v - sizeof(unsigned 
>> short));
>> +}
>>
>>  static void* untracked_alloc_page()
>>  {
>> @@ -1559,6 +1672,10 @@ static inline void* std_malloc(size_t size, size_t 
>> alignment)
>>          ret = translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::mempool,
>>                                   ret);
>>          trace_memory_malloc_mempool(ret, size, 1 << n, alignment);
>> +    } else if (size < mmu::page_size && alignment < mmu::page_size && 
>> !smp_allocator) {
>> +        ret = memory::early_alloc_object(size, alignment);
>> +        ret = translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::mempool,
>> +                                 ret);
>>      } else if (minimum_size <= mmu::page_size && alignment <= 
>> mmu::page_size) {
>>          ret = mmu::translate_mem_area(mmu::mem_area::main, 
>> mmu::mem_area::page,
>>                                         memory::alloc_page());
>> @@ -1592,7 +1709,13 @@ static size_t object_size(void *object)
>>      case mmu::mem_area::mempool:
>>          object = mmu::translate_mem_area(mmu::mem_area::mempool,
>>                                           mmu::mem_area::main, object);
>> -        return memory::pool::from_object(object)->get_size();
>> +        {
>> +            auto pool = memory::pool::from_object(object);
>> +            if (pool)
>> +                return pool->get_size();
>> +            else
>> +                return memory::early_object_size(object);
>>
> +        }
>>      case mmu::mem_area::page:
>>          return mmu::page_size;
>>      case mmu::mem_area::debug:
>> @@ -1639,7 +1762,13 @@ void free(void* object)
>>      case mmu::mem_area::mempool:
>>          object = mmu::translate_mem_area(mmu::mem_area::mempool,
>>                                           mmu::mem_area::main, object);
>> -        return memory::pool::from_object(object)->free(object);
>> +        {
>> +            auto pool = memory::pool::from_object(object);
>> +            if (pool)
>> +                return pool->free(object);
>> +            else
>> +                return memory::early_free_object(object);
>>
> +        }
>>      case mmu::mem_area::debug:
>>          return dbg::free(object);
>>      default:
>> diff --git a/include/osv/mempool.hh b/include/osv/mempool.hh
>> index 5dae7646..01bf9de2 100644
>> --- a/include/osv/mempool.hh
>> +++ b/include/osv/mempool.hh
>> @@ -36,6 +36,11 @@ void setup_free_memory(void* start, size_t bytes);
>>
>>  namespace bi = boost::intrusive;
>>
>> +struct early_page_header {
>> +    void* owner;
>> +    unsigned short allocations_count;
>> +};
>> +
>>  struct free_object {
>>      free_object* next;
>>  };
>> -- 
>> 2.19.1
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH] Implement more space-efficient early memory allocation scheme

Reply via email to