On Fri, 2007-06-08 at 00:47 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2007-06-07 at 21:27 +0200, Jan Kiszka wrote:
> >> NZG wrote:
> >>>>> Write does stop throwing errors with a pool of 16384.
> >>>> Just repeating for confirmation: rt_pipe_create with smaller, but
> >>>> non-zero pool sizes doesn't report some error?
> >>> Incorrect, if it's created with smaller it does report an error. If it's
> >>> created with 0 however, it seems ok. (at least it doesn't throw an error)
> >> You mean the write fails, but rt_pipe_create is fine. At least here, and
> >> Philippe noticed the same. Looks like the margin for minimal-sized heaps
> >> is broken. 2*PAGE_SIZE should be 3*PAGE_SIZE net space, Philippe?
> > Actually, rt_pipe_create() already rounds this value to 3*PAGE_SIZE,
> > right before calling the sysalloc service.
> > Here is the sequence of events that leads to the situation Nathan is
> > seeing:
> > - Passing poolsize = 1024 to rt_pipe_create() creates a local heap of
> > 12288 bytes (3 * PAGE_SZ), with 32 bytes of overhead taken from one of
> > these pages to hold the meta-data.
> > - 4112 bytes (4096 + sizeof(message header)) are then requested to this
> > local pool to hold the internal streaming buffer
> > (XENO_OPT_NATIVE_PIPE_BUFSZ), which ends up consuming two pages, i.e.
> > 8192 bytes, from this pool. The reason for this is due to the way the
> > McKusick allocation scheme we use works; basically, block sizes greater
> > than the page size are always rounded to a multiple of the page size.
> > Requested block sizes are always rounded up to the nearest power of two,
> > which the allocator groups in pages holding blocks of the same size.
> > This is not pretty wrt internal fragmentation, but quite efficient
> > CPU-wise, when the page size is properly chosen wrt the most common
> > allocation pattern, that is. Most importantly, blocks greater than a
> > page will never lay on partially consumed pages (by other blocks). So,
> > in our case, we started with three free pages, one already holds some
> > meta-data, and we need two pages to fullfil the current allocation
> > request. Therefore, after this request has suceeded, we have no page
> > left in the pool.
> > - When the write() call is issued, the pipe driver requests a 32 bytes
> > block to hold the data moving from user-space to kernel space. Too bad,
> > we have no other page left to dedicate to blocks holding that size, so
> > no block is available, which in turn causes write() to return -ENOMEM.
> Ah, I see.
> > This situation illustrates the conflict which is raised when small heaps
> > (1k) are mapped on large page sizes (4k). Even when rounding them to a
> > (small) multiple of the page size, the pool might rapidly run short of
> > free pages, depending on the allocation pattern.
> Reminds me of the TLSF allocator claiming to perform smartly also on
> smaller hunks. But that code is still 32-bit-focused, and would still
> need someone to define and run comparative tests on representative
> Xenomai setups. :-/
Yep, I see this patch series pending into my patch fridge right now.
> > IOW, when configuring a heap, it is better to know which kind of block
> > sizes are going to be requested from it, and reserve the appropriate
> > number of pages for each different size when evaluating the total size
> > of the heap.
> > Yes, it's not that simple. No, I'm not that sorry.
> For now, can we take XENO_OPT_NATIVE_PIPE_BUFSZ into account when
> picking a reasonable minimum size in rt_pipe_create on behalf of a
> close-fisted user?
Yes, the size of the streaming buffer has to be taken into account, it's
clearly missing right now. I'm also going to use another page size than
PAGE_SIZE, and this change may likely be applicable to almost all
callers of xnheap_init() actually. The initial intent was to pick a
value which would be consistent with the natural block size for the
Linux VM, but thinking about it once more, it's mostly irrelevant, since
the way the heap manager works is orthogonal with how kmalloc/vmalloc
organizes the core memory anyway (i.e. heap memory is obtained from a
single sysalloc call, so the worst-case incurred here is only having
some external fragmentation due to the last VM page not being fully
occupied by heap memory).
512 would be likely a better page size for most usage patterns; this
would increase the overhead brought by the meta-data, but we are talking
about 1 byte per addressable page within the heap, so this should not be
that penalizing anyway.
Xenomai-core mailing list