Hi Martin,
thank you for your answers! I try to answer to both your emails here.
Sure, but we still need to arrive at some equation for determining a
sensible default stack size, while allowing for both small values of
"--mem" (e.g. 8MB, or even less?) and large values.
I agree. Ideally the (uni)kernel developer is aware of the absence of an
unbounded stack and codes accordingly. I would assume that required
stack size does scale logarithmically with the heap size. Given such a
relationship, we are on the safe side soon. And if we have a separate
stack region, the user will also be notified clearly of the stack
overflow.
Practically, I wonder where very long call chains can occur. For example
in Ocaml it seems List.map is not tail recursive to avoid reversing.
What are they using as a limit?
For reference, ulimit -s seems to be 8M on Linux x86_64. Furthermore,
the default pthread stack size seems to be 2M. For sufficiently large
instances (> 16M), we could just fix 2M. For smaller machines (minimum
512K) somehow scale down to 64K??
In my mind the way to do this consistently for all targets is to make
the
"desired stack size" a property attached to the *unikernel binary* and
initially set by the unikernel developer, with the libOS build system
determining the default. The *operator* should be able to override
this.
By "property of the unikernel binary" I mean something that is
declarative(!) and forms part of a "manifest" that is embedded into the
binary as an ELF note.
This sounds like a good solution. Maybe fix a default desired size of 2M
and allow the developer
to overwrite it in the manifest. The operator can also adjust it by
tuning some option.
The application should then refuse to run if the desired size cannot be
provided,
or if the heap/stack ratio becomes too small.
I am not sure about the "dynamic part". I would either set the stack
from
outside (as discussed above), or allow some kind of memconfig call,
which
can be executed only once after application startup.
Alternatively you could provide munmap and munmap_done. Dynamic people
would
just never call munmap_done...
I think we can forget about multiple stacks for now. Having thought
about
it, even having a single separate memory region for the stack is tricky
enough:
I wonder what the difference is between multiple stacks and multiple
separate memory regions?
If multiple separate memory regions would be available, the application
could use those as stacks
and perform switching between them. If that makes sense is a question to
the unikernel developers.
I thought a bit more about this idea of adding a munmap call and I
don't
really like it.
Maybe the goal should be extended to allow multiple different memory
areas,
randomly distributed in the address space to leverage ASLR. One of
those
ASLR is a separate topic in itself, see here for a rough plan of what
needs
to be done:
https://github.com/Solo5/solo5/issues/304
In that issue you are talking about the randomization of the memory
location of the binary (code & data)?
I agree that this would be nice to have. But is that not a separate
issue from how the heap memory regions are organized? Ideally all of it
should be randomized, code, data and heap.
No, this has already been discussed before. Dynamic memory allocation
is
not on the cards.
Most recently
https://github.com/Solo5/solo5/issues/335#issuecomment-472499246 and
earlier https://github.com/Solo5/solo5/issues/223.
I looked quickly at 223 and 335, maybe I missed some things.
In 223 you threw out the malloc implementation from solo5, which makes
perfect sense, since malloc should be handled on the application level.
Fined grained memory allocation should not be provided by solo5.
In 335 you refuse to add mmap. But there seems to be agreement that
the application/unikernel has no control over the virtual memory layout.
This has to be managed by solo5. Does your refusal to add mmap also
apply to the configuration
phase in a very restricted sense?
Providing just a "munmap()" and nothing else might be a simpler way to
get
guard pages, or it might not. Anyway, one thing at a time. As I
mentioned
in my other email, lets ignore multiple stacks for now and just
concentrate
on how a separate stack region could be done.
I agree, for me stack/heap separation is also the more important issue
to avoid corruption.
Multiple memory regions or guarded regions are only nice have. But
instead of adding munmap
calls to configure guard pages, I think it is better to add two calls
solo5_mem_alloc()
and solo5_mem_lock(), which also allows randomization of the addresses
returned by solo5_mem_alloc().
This is what I proposed in the github PR. For now I just exposed the
already existing bump allocation
scheme, but this could be randomized at some point. I argue that this is
NOT dynamical memory allocation,
but rather configuration of the memory layout until solo5_mem_lock() is
called.
Compare option I:
1. app is informed of heap_start, heap_size, stack_start, stack_size
2. app calls solo5_munmap on parts of (heap_start, heap_start+heap_size)
for guarding
3. app calls solo5_munmap_done to finish initialization
versus option II:
1. app is informed of stack_start, stack_size, mem_avail
2. app calls solo5_mem_alloc to allocate mem_avail until it is exhausted
3. app calls solo5_mem_lock to finish initialization
Basically both options are equivalent, but in option II more
responsibility
is pushed away from the application to solo5.
On primitive targets without the ability to manipulate the page table,
there won't be any difference and solo5_munmap would be a nop.
However on targets which support modifying the page table, solo5 could
*automatically* guard the regions by adding gaps after each
solo5_mem_alloc block.
And even better, the memory block locations could also be randomized.
This
is not possible in option I.
You mentioned in 335 that clone+munmap allowed privilege escalation. But
in both
schemes, option I and II you need mmap/munmap. However the memory
configuration should be finished by calling solo5_mem_lock after
initialization.
You could even enforce that the API is used correctly, by unlocking
network and blocks
only after solo5_mem_lock has been called ;)
Daniel