Re: [PATCH] Implement more space-efficient early memory allocation scheme

Nadav Har'El Mon, 10 Dec 2018 09:33:38 -0800

On Sun, Dec 9, 2018 at 10:42 PM Waldek Kozaczuk <[email protected]>
wrote:


> Please see my answers below. As I can tell you do not have any major
> objections so I will make the changes and send a revised patch. I probably
> need to adjust some of my comments.
>
> On Sunday, December 9, 2018 at 7:53:38 AM UTC-5, Nadav Har'El wrote:
>>
>> On Thu, Nov 22, 2018 at 8:28 AM Waldemar Kozaczuk <[email protected]>
>> wrote:
>>
>>> This patch implements a naive but efficient scheme of allocating
>>> memory before SMP is enabled and regular malloc pools are setup.
>>> Before this patch every single pre-SMP allocation would
>>> be backed by full 4K page regardless if requested size
>>> was 8 or 2000 bytes.
>>>
>>> New pre-SMP allocation scheme is based on the fact that
>>> relatively few objects are allocated (~ 2,000) and rarely
>>> freed (< 20%) and most are very small (<50 bytes).
>>> The algorithm is as simple as finding next big enough aligned
>>> space in a 4K page of memory, marking its size and resetting
>>> offset to the next free area of the page otherwise allocating
>>> new 4K page. For details read comments in the code.
>>>
>>> Even though this scheme is very simplistic it is
>>> roughly 35 times more space efficient and allows
>>> us to save around 6M of memory. It was measured that the net sum of
>>> sizes (as is) of all requested allocations is typically around 125K
>>> and new scheme uses ~44 pages (176K) of memory to satisfy all requests.
>>>
>>> Fixes #270
>>>
>>> Signed-off-by: Waldemar Kozaczuk <[email protected]>
>>> ---
>>>  core/mempool.cc        | 133 ++++++++++++++++++++++++++++++++++++++++-
>>>  include/osv/mempool.hh |   5 ++
>>>  2 files changed, 136 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/core/mempool.cc b/core/mempool.cc
>>> index 23dc67da..8c23e916 100644
>>> --- a/core/mempool.cc
>>> +++ b/core/mempool.cc
>>> @@ -284,6 +284,9 @@ void pool::free_same_cpu(free_object* obj, unsigned
>>> cpu_id)
>>>      trace_pool_free_same_cpu(this, object);
>>>
>>>      page_header* header = to_header(obj);
>>> +    if (!header->owner) {
>>> +        return;
>>>
>>
>> Would be nice to have a comment saying what is this case, i.e., this is
>> an early allocation that we simply ignore its free().
>> However, I don't understand why this is the case - it seems that earlier
>> in the code, if the owner (pool::from_object()) is null,
>> you don't call this function, you call something else. So why bother with
>> this test here at all? Shouldn't we just crash if this
>> bug happens?
>>
>> You are right we should never get here. This is me being obsessively
> defensive - just in case we somehow get here. You saying I should remove
> this for clarity?
>

So perhaps it should be an assert(header->owner), not a silent return?
But even that is not really necessary, as if header->owner is null, we'll
just crash below when it's dereferenced, no?


>
>> +static early_page_header* to_early_page_header(void* object)
>>> +{
>>> +    return reinterpret_cast<early_page_header*>(
>>> +            reinterpret_cast<std::uintptr_t>(object) & ~(page_size -
>>> 1));
>>>
>>
>> Looking at your code, it seems you are assuming that early_page_header
>> and page_header start with
>> the same "owner" field (with early allocations having 0 there, normal
>> allocations point to the pool), but
>> then have different content. Is that right? Maybe you should have a
>> comment about this next to
>> early_page_header, not just in setup_early_alloc_page()?
>>
> Right. Ideally we could have some sort of "data interface" construct
> specifying that both page_header and early_page_header start with same
> field. Maybe we can introduce this as a 1-field struct that both other
> struct share, or comment is enough?
>

I think a comment is enough.


>> +static void* early_alloc_object(size_t size, size_t alignment)
>>> +{
>>> +    WITH_LOCK(early_alloc_lock) {
>>> +        if (!early_object_page) {
>>> +            setup_early_alloc_page();
>>> +        }
>>> +
>>> +        size_t offset = (early_alloc_next_offset / alignment + 1) *
>>> alignment;
>>>
>> +        if (offset - early_alloc_next_offset < sizeof(unsigned short)) {
>>> // Make sure enough place for size
>>>
>> +            offset += alignment; //Assume alignment >= 2
>>> +        }
>>>
>>
>> I think you can replace the above 4 lines of calculations, with one
>> clearer line:
>>
>>        offset = align_up(early_alloc_next_offset + sizeof(short),
>> alignment);
>>
>> (but please check that I didn't misunderstand the calculations)
>>
> I am not 100% sure as my calculations make sure that there is 2 bytes left
> (sizeof(usigned short) between end of previous object and beginning of next
> one to to store size of of the object. Effectively each malloc'ed object is
> preceded by 2 bytes where the size is stored. Obviously the alignment
> requirement has to be satisfied as well.
>

I think that's exactly what the align_up() I suggested doesn, no? I asks
for an address after early_alloc_next_offset + sizeof(short) - so at least
two extra empty bytes, *and*, aligns it to alignment. I think it's so much
easier to understand than your four lines of calculations. It took me a
while to understand your calculation because it does something strange: the
"+ 1" in the calculation adds sometimes an extra alignment, but sometimes
less, and in the worst case can add just one byte so you need the next if()
to make sure you added at least two. But the align_up() formula just does
this correctly at the first go, and doesn't need an additional if().
Unless I'm misunderstanding something.


>
>>
>>> +
>>> +        if (offset + size > page_size) {
>>> +            setup_early_alloc_page();
>>>
>>
>> Looking below, it seems you don't support large early allocations at all
>> (more on this below).
>> If you did, then a possible optimization here could be, if size is large
>> (e.g., bigger than a page),
>> there is no reason to throw away the half-allocated page. We can just
>> make the new allocation
>> in full new pages and keep using the old half-full page.
>>
>>
>>> +            offset = (early_alloc_next_offset / alignment + 1) *
>>> alignment;
>>>
>>
By the way, another instance of this calculation. Here the "+1" is always
enough (when alignment >=2), you don't need the extra if, but wouldn't it
be clearer to use the same align_up(next_offset + sizeof(short)) as above?


+        }
>>> +
>>> +        assert(offset - early_alloc_next_offset >= sizeof(unsigned
>>> short));
>>>
>> +        assert(offset + size <= page_size); // Verify we have enough
>>> space to satisfy this allocation
>>>
>>
>> But if one tries to malloc size>4096, this will fail here, no? Why is
>> this not a problem?
>> Or is it just that we assume that early on we only have small
>> allocations, and this assertion
>> verifies it (if this is the case, please add a comment about this
>> assertion saying why we
>> make this assumption).
>>
> It is OK because this code is not meant to handle > 4096 allocations. If
> you look at the if statements in std_malloc() the >= 4096 early allocations
> should still be handled by malloc_large().
>

I didn't notice this. However, it seems to me this code will fail for lower
numbers than 4096. What will malloc(4095) do? Maybe over some size it
should fallback to allocating an entire page (as before)?


> +
>>> +        *size_addr = 0; // Reset size to 0 so that we know this object
>>> was freed and prevent from freeing again
>>>
>>
>> This is nice protection against double-free, but I don't think we have
>> such protection even for normal
>> allocations, so it's definitely not important for early allocation.
>> Nobody expects it to be fine to free()
>> and already freed object and free() will just silently do nothing.
>> Then again, if we anyway have this "size" thing (and it's sad we have
>> to... I'm not sure we *really*
>> need to implement object_size() correctly for early allocations...), I
>> guess we can use it to get
>> double-free protection for free...
>>
> If you think it makes unnecessarily more complicated and possibly prone to
> bugs I can remove it.
>

I don't really care. Initially I thought that you only needed "size" for
this protection, but when I understood
you need it for other things as well (although I'm not sure it was *really*
needed to be supported, I think
the old code didn't support it either...) this protection can stay as
well...


>
>> +        page_header->allocations_count--;
>>> +        if (page_header->allocations_count <= 0) { // Any early page
>>> +            early_free_page(page_header);
>>> +            if (early_object_page == page_header) {
>>> +                early_object_page = nullptr;
>>> +            }
>>> +        }
>>> +        else if(early_object_page == page_header) { // Current early
>>> page
>>> +            void *last_obj = static_cast<void *>(page_header) +
>>> (early_alloc_next_offset - size);
>>> +            // Check if we are freeing last allocated object (free
>>> followed by malloc)
>>> +            // and deallocate if so by moving the
>>> early_alloc_next_offset to the previous
>>> +            // position
>>> +            if (last_obj == object) {
>>> +                early_alloc_next_offset -= size;
>>>
>>
>> Is it really -= size, or do you also need to undo the increase for
>> alignment, two-byte for size, etc.?
>>
>>
> I think I am fine because here I am only detecting if this is the case of
> free() being called on the last malloc()-ed object which means that
> early_alloc_next_offset has not yet been adjusted for next malloc(), right?
> And given that in early_alloc_object() I have this logic:
> auto ret = early_object_page + offset;
> early_alloc_next_offset = offset + size;
> subtracting size from current early_alloc_next_offset should give me
> offset of the last malloc-ed object in the current page. Is this wrong
> assumption?
>

I don't understand why this is right, consider we ask for 16-byte
alignment, and we start with offset=32:
1. allocating 16 bytes with 16-byte alignment will first move offset to 48
(for the two-byte size...) and return the object at offset 48. The new
offset will become 64.
2. freeing the 16-byte object at 48 will subtract only 16 from 64. The new
offset will be 48. It did *not* return to 32 as it was before the
allocation.

What am I missing?

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH] Implement more space-efficient early memory allocation scheme

Reply via email to