Ciao Waldek/Nadav/Friends,

Nadav should give the ok
But it seems great on my small/just for fun tests! :)


Kind Regards,

Geraldo Netto
Sapere Aude => Non dvcor, dvco
http://exdev.sf.net/

On Tue, 4 Dec 2018 at 13:37, Waldek Kozaczuk <[email protected]> wrote:
>
> Ping.
>
> On Tuesday, November 27, 2018 at 7:36:30 AM UTC-5, Waldek Kozaczuk wrote:
>>
>> Any feedback, please?
>>
>> On Thursday, November 22, 2018 at 1:07:16 PM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> As I noted in the commit message this patch may be not very elegant but 
>>> seems to be pretty effective. Please suggest any other ideas of how to 
>>> implement it better. Please my self-review comments below.
>>>
>>> On Thursday, November 22, 2018 at 1:28:32 AM UTC-5, Waldek Kozaczuk wrote:
>>>>
>>>> This patch implements a naive but efficient scheme of allocating
>>>> memory before SMP is enabled and regular malloc pools are setup.
>>>> Before this patch every single pre-SMP allocation would
>>>> be backed by full 4K page regardless if requested size
>>>> was 8 or 2000 bytes.
>>>>
>>>> New pre-SMP allocation scheme is based on the fact that
>>>> relatively few objects are allocated (~ 2,000) and rarely
>>>> freed (< 20%) and most are very small (<50 bytes).
>>>> The algorithm is as simple as finding next big enough aligned
>>>> space in a 4K page of memory, marking its size and resetting
>>>> offset to the next free area of the page otherwise allocating
>>>> new 4K page. For details read comments in the code.
>>>>
>>>> Even though this scheme is very simplistic it is
>>>> roughly 35 times more space efficient and allows
>>>> us to save around 6M of memory. It was measured that the net sum of
>>>> sizes (as is) of all requested allocations is typically around 125K
>>>> and new scheme uses ~44 pages (176K) of memory to satisfy all requests.
>>>>
>>>> Fixes #270
>>>>
>>>> Signed-off-by: Waldemar Kozaczuk <[email protected]>
>>>> ---
>>>>  core/mempool.cc        | 133 ++++++++++++++++++++++++++++++++++++++++-
>>>>  include/osv/mempool.hh |   5 ++
>>>>  2 files changed, 136 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/core/mempool.cc b/core/mempool.cc
>>>> index 23dc67da..8c23e916 100644
>>>> --- a/core/mempool.cc
>>>> +++ b/core/mempool.cc
>>>> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned 
>>>> cpu_id)
>>>>      trace_pool_free_same_cpu(this, object);
>>>>
>>>>      page_header* header = to_header(obj);
>>>> +    if (!header->owner) {
>>>> +        return;
>>>> +    }
>>>
>>> Not sure if that check and in other place is necessary but in case we get 
>>> here and it is early alloc page the code below will not execute that way as 
>>> it does not apply.
>>>>
>>>>      if (!--header->nalloc && have_full_pages()) {
>>>>          if (header->local_free) {
>>>>              _free->erase(_free->iterator_to(*header));
>>>> @@ -321,6 +324,9 @@ void pool::free(void* object)
>>>>
>>>>          free_object* obj = static_cast<free_object*>(object);
>>>>          page_header* header = to_header(obj);
>>>> +        if (!header->owner) {
>>>> +            return;
>>>> +        }
>>>>          unsigned obj_cpu = header->cpu_id;
>>>>          unsigned cur_cpu = mempool_cpuid();
>>>>
>>>> @@ -1432,6 +1438,113 @@ static void early_free_page(void* v)
>>>>      auto pr = new (v) page_range(page_size);
>>>>      free_page_range(pr);
>>>>  }
>>>> +//
>>>> +// Following variables and functions implement simple
>>>> +// early (pre-SMP) memory allocation scheme.
>>>> +mutex early_alloc_lock;
>>>> +// early_object_pages holds a pointer to the beginning of the current page
>>>> +// intended to be used for next early object allocation
>>>> +static void* early_object_page = nullptr;
>>>> +// early_alloc_next_offset points to the 0-relative address of free
>>>> +// memory within a page pointed by early_object_page
>>>> +static size_t early_alloc_next_offset = 0;
>>>> +
>>>> +static early_page_header* to_early_page_header(void* object)
>>>> +{
>>>> +    return reinterpret_cast<early_page_header*>(
>>>> +            reinterpret_cast<std::uintptr_t>(object) & ~(page_size - 1));
>>>> +}
>>>> +
>>>> +static void setup_early_alloc_page() {
>>>> +    early_object_page = early_alloc_page();
>>>> +    early_page_header *page_header = 
>>>> to_early_page_header(early_object_page);
>>>> +    // Set the owner field to null so that functions that free objects
>>>> +    // or compute object size can differentiate between post-SMP malloc 
>>>> pool
>>>> +    // and early (pre-SMP) allocation
>>>> +    page_header->owner = nullptr;
>>>> +    page_header->allocations_count = 0;
>>>> +    early_alloc_next_offset = sizeof(early_page_header);
>>>> +}
>>>> +
>>>> +//
>>>> +// This functions implements simple but effective scheme
>>>> +// of allocating objects before SMP is setup. It does so by
>>>> +// remembering where within a page free memory starts. Then it
>>>> +// calculates next closest offset matching specified alignment
>>>> +// and verifies there is enough space until end of the current
>>>> +// page to allocate from. If not it allocates next full page
>>>> +// to find enough requested space.
>>>> +static void* early_alloc_object(size_t size, size_t alignment)
>>>> +{
>>>
>>> Not sure if the mutex is really necessary as in pre-SMP there is only 
>>> single thread (am I wrong?)
>>> but better be safe than sorry.
>>>>
>>>> +    WITH_LOCK(early_alloc_lock) {
>>>> +        if (!early_object_page) {
>>>> +            setup_early_alloc_page();
>>>> +        }
>>>> +
>>>> +        size_t offset = (early_alloc_next_offset / alignment + 1) * 
>>>> alignment;
>>>> +        if (offset - early_alloc_next_offset < sizeof(unsigned short)) { 
>>>> // Make sure enough place for size
>>>> +            offset += alignment; //Assume alignment >= 2
>>>> +        }
>>>> +
>>>> +        if (offset + size > page_size) {
>>>> +            setup_early_alloc_page();
>>>> +            offset = (early_alloc_next_offset / alignment + 1) * 
>>>> alignment;
>>>> +        }
>>>> +
>>>> +        assert(offset - early_alloc_next_offset >= sizeof(unsigned 
>>>> short));
>>>> +        assert(offset + size <= page_size); // Verify we have enough 
>>>> space to satisfy this allocation
>>>> +
>>>> +        auto ret = early_object_page + offset;
>>>> +        early_alloc_next_offset = offset + size;
>>>> +
>>>> +        // Save size of the allocated object 2 bytes before it address
>>>> +        *reinterpret_cast<unsigned short *>(ret - sizeof(unsigned short)) 
>>>> =
>>>> +                static_cast<unsigned short>(size);
>>>
>>> The address of this 2 bytes field is effectively not 16-bytes aligned so I 
>>> wonder
>>> if we should actually make this field be "ret - 16 bytes".
>>>>
>>>> +        to_early_page_header(early_object_page)->allocations_count++;
>>>> +        return ret;
>>>> +    }
>>>> +}
>>>> +
>>>> +//
>>>> +// This function fairly rarely actually frees previously
>>>> +// allocated memory. It does so only if all objects
>>>> +// have been freed in a page based on allocations_count or
>>>> +// if the object being freed is the last one that was allocated.
>>>> +static void early_free_object(void *object)
>>>> +{
>>>> +    WITH_LOCK(early_alloc_lock) {
>>>> +        early_page_header *page_header = to_early_page_header(object);
>>>> +        assert(!page_header->owner);
>>>> +        unsigned short *size_addr = reinterpret_cast<unsigned 
>>>> short*>(object - sizeof(unsigned short));
>>>> +        unsigned short size = *size_addr;
>>>> +        if (!size) {
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        *size_addr = 0; // Reset size to 0 so that we know this object 
>>>> was freed and prevent from freeing again
>>>
>>> Not sure if checking that object was already freed is really necessary.
>>>>
>>>> +        page_header->allocations_count--;
>>>> +        if (page_header->allocations_count <= 0) { // Any early page
>>>> +            early_free_page(page_header);
>>>> +            if (early_object_page == page_header) {
>>>> +                early_object_page = nullptr;
>>>> +            }
>>>> +        }
>>>> +        else if(early_object_page == page_header) { // Current early page
>>>> +            void *last_obj = static_cast<void *>(page_header) + 
>>>> (early_alloc_next_offset - size);
>>>> +            // Check if we are freeing last allocated object (free 
>>>> followed by malloc)
>>>> +            // and deallocate if so by moving the early_alloc_next_offset 
>>>> to the previous
>>>> +            // position
>>>> +            if (last_obj == object) {
>>>> +                early_alloc_next_offset -= size;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static size_t early_object_size(void* v)
>>>> +{
>>>> +    return *reinterpret_cast<unsigned short*>(v - sizeof(unsigned short));
>>>> +}
>>>>
>>>>  static void* untracked_alloc_page()
>>>>  {
>>>> @@ -1559,6 +1672,10 @@ static inline void* std_malloc(size_t size, size_t 
>>>> alignment)
>>>>          ret = translate_mem_area(mmu::mem_area::main, 
>>>> mmu::mem_area::mempool,
>>>>                                   ret);
>>>>          trace_memory_malloc_mempool(ret, size, 1 << n, alignment);
>>>
>>> With this patch and other one we have 2 more if branches. I wonder if 
>>> performance wise
>>> we should re-order conditions in those ifs and check for  smp_allocator 
>>> first.
>>>>
>>>>
>>>> +    } else if (size < mmu::page_size && alignment < mmu::page_size && 
>>>> !smp_allocator) {
>>>> +        ret = memory::early_alloc_object(size, alignment);
>>>> +        ret = translate_mem_area(mmu::mem_area::main, 
>>>> mmu::mem_area::mempool,
>>>> +                                 ret);
>>>>      } else if (minimum_size <= mmu::page_size && alignment <= 
>>>> mmu::page_size) {
>>>>          ret = mmu::translate_mem_area(mmu::mem_area::main, 
>>>> mmu::mem_area::page,
>>>>                                         memory::alloc_page());
>>>> @@ -1592,7 +1709,13 @@ static size_t object_size(void *object)
>>>>      case mmu::mem_area::mempool:
>>>>          object = mmu::translate_mem_area(mmu::mem_area::mempool,
>>>>                                           mmu::mem_area::main, object);
>>>> -        return memory::pool::from_object(object)->get_size();
>>>> +        {
>>>> +            auto pool = memory::pool::from_object(object);
>>>> +            if (pool)
>>>> +                return pool->get_size();
>>>> +            else
>>>> +                return memory::early_object_size(object);
>>>> +        }
>>>>      case mmu::mem_area::page:
>>>>          return mmu::page_size;
>>>>      case mmu::mem_area::debug:
>>>> @@ -1639,7 +1762,13 @@ void free(void* object)
>>>>      case mmu::mem_area::mempool:
>>>>          object = mmu::translate_mem_area(mmu::mem_area::mempool,
>>>>                                           mmu::mem_area::main, object);
>>>> -        return memory::pool::from_object(object)->free(object);
>>>> +        {
>>>> +            auto pool = memory::pool::from_object(object);
>>>> +            if (pool)
>>>> +                return pool->free(object);
>>>> +            else
>>>> +                return memory::early_free_object(object);
>>>> +        }
>>>>      case mmu::mem_area::debug:
>>>>          return dbg::free(object);
>>>>      default:
>>>> diff --git a/include/osv/mempool.hh b/include/osv/mempool.hh
>>>> index 5dae7646..01bf9de2 100644
>>>> --- a/include/osv/mempool.hh
>>>> +++ b/include/osv/mempool.hh
>>>> @@ -36,6 +36,11 @@ void setup_free_memory(void* start, size_t bytes);
>>>>
>>>>  namespace bi = boost::intrusive;
>>>>
>>>> +struct early_page_header {
>>>> +    void* owner;
>>>> +    unsigned short allocations_count;
>>>> +};
>>>> +
>>>>  struct free_object {
>>>>      free_object* next;
>>>>  };
>>>> --
>>>> 2.19.1
>>>>
> --
> You received this message because you are subscribed to the Google Groups 
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to