On 25/01/15 15:39, Norman Feske wrote:
>> The problem there is that file system implementations are very
>> intertwined with the vfs and block/page cache interface semantics.  For
>> example, blocks must be written in correct order to retain file system
>> consistency, and some blocks must not be written before other are.
>> Furthermore, there's no global right way to do it, and e.g. ext2 and
>> journalled FFS have slightly different kinks.
>
> that is very interesting. I can understand that the order of write
> operations must be maintained when writing back the cache. But I would
> love to learn about further considerations. It just occurred to me that
> it might be a good idea to incorporate such consistency information into
> Genode's block-session interface. Can you recommend specific learning
> material on this consistency issue?

Admittedly, it's been a number of years (5+) since I last thought about 
file systems critically, but:

For the block layer it should be enough to just write the blocks and 
notify the upper layer when they have been committed.

For the FFS (and ext2) model, the best paper is probably the one on soft 
updates, which describes the original consistency problem, and how you 
can render most writes asynchronous if you track the dependencies of 
writes in memory:
https://www.usenix.org/legacy/publications/library/proceedings/usenix99/full_papers/mckusick/mckusick.pdf

For block level journalling, the canonical reference is probably the 
BeOS file system book:
http://www.nobius.org/~dbg/practical-file-system-design.pdf

Non-overwriting file systems are not inherently so sensitive to 
correctly ordered writes, but I do think they too have some sort of 
commit points where everything up to X must be committed (my 
off-the-top-of-my-head memory is a bit weak on the details).

> [memory use, ballooning, etc.]

I understand your issue.  We have a similar problem when we use rump 
kernels as unikernels, since the application memory is currently not 
managed by the rump kernel, and we just pick some arbitrary division 
between the two.  We've been kicking around some possible solutions, but 
the need to solve the problem hasn't yet percolated to the top.

For us, it's not just limited to file systems either.  For example, the 
amount of network buffers depends on the amount of memory, and should be 
controlled in a sensible instead of arbitrary manner.

Now, I think ballooning would be quite easy.  In theory it's as simple 
as adjusting the rump_physmemlimit, recalculating the derived values 
(including telling it to the buffer cache), and kicking the pagedaemon. 
  The only problem I can see at the moment is that you cannot set the 
value too low, and there's no real way knowing what "too low" is, so 
that will take some playing around with.  Plus, there is of course the 
usual weeding out of cases where theory does not match reality.

>> There's no mandate to create a host thread per se.  You need to create a
>> separately schedulable entity with a stack and thread-local storage, but
>> if you choose to implement that multiplexed on top of a single host
>> thread, that's fine.
>
> Does that mean that Rump kernels do not rely on preemptive threading? If
> yes, user-level thread scheduling (e.g., based on setjmp, longjmp) would
> do? What keeps you back from doing this by default? This would make Rump
> kernel behave deterministically across all host platforms and possibly
> simplify the hypercall interface. Wouldn't that be desirable?

Right, there's no reliance on preemptive threading.  That was actually 
quite a surprising result when it was discovered some years ago. 
Actually, the hardest part in figuring out rump kernels was figuring out 
a scheduling model that would work in the original use case of running 
on top of userspace threads, which are preemptive.

Yes, non-preemptive scheduling is very desirable; I think deterministic 
scheduling wins over preemptive scheduling in almost every case, except 
when you cannot assume that what you are running will yield.  For 
example, we have optimizations planned for the networking in the case 
where the rump kernel scheduling "accidentally" matches the host scheduling:
http://wiki.rumpkernel.org/Performance%3A-optimizing-networking-performance
(just need to get around to doing that some day)

Now, the policy of the thread implementation is entirely up to the 
hypercall implementation, i.e. platform that you are running rump 
kernels on.  Currently, I don't see a strong argument for moving the 
threading policy into the rump kernel, and actually plenty of arguments 
against it -- I don't want to dictate policy.

Plus, what Justin already wrote (both in terms of email and code).

> In general, we try to avoid using multiple threads these days except for
> two cases: Where work load is to be distributed over multiple cores, or
> where different code paths should be schedulable independently (think of
> low-latency IRQ handlers). Both cases certainly do not apply to file
> systems.

Sort of agree, but note that if you say multithreading does not apply to 
file systems, you are also saying that multithreading does not apply to 
anything using the file system.  Maybe I'm splitting hairs there (the 
file system is unlikely to internally require n threads), but anyway ;)

> In all other cases where threads had traditionally been used, we prefer
> to model components as state machines that respond to asynchronous
> events and incoming RPC requests (similar to a select loop). I agree
> that this can produce dead locks. But on the other hand, the use of
> threads is even worse because, when not properly synchronized, they
> become prone to race conditions, which may eventually result in silent
> memory corruptions. When debugging, I vastly prefer a deterministic
> deadlock (where I can look at the backtrace to spot the problem) over a
> sporadic memory corruption issue (where I can spot a symptom but rarely
> the cause).

I have two remarks here:

First, I modelled a userspace fs framework (puffs) after an event loop. 
  In retrospect, I think it was a bad idea, since after I added any sort 
of asynchronous handling, it ran into the same issues as with threads, 
except with homegrown interfaces to accomplish more or less the same. 
(side remark: that also made debugging difficult, since NetBSD's gdb 
didn't know about the homegrown concurrency.  well, actually, back in 
the short period of time I'm referring to, running gdb on a pthread 
program caused the NetBSD kernel to panic, but ignore that minor 
advantage for the homegrown event loop ;)

Second, with non-preemptive threads, the thread scheduler is the event 
loop.  I don't really see much difference in a thread programming 
interface and an event loop interface, except that the thread 
programming interface carries the promise that you can crank up the 
number of cores processing the work.  (I'm purposefully ignoring the 
fact that a thread programming interface is easier to misuse if you 
don't know what you're doing because systems programmers can be expected 
to know what they're doing)

Determinism works only as long as all of your I/O is deterministic :/

> We are successfully applying this single-threaded approach to all new
> components and are in the process of reworking all existing components
> to get rid of threads. Even for the Wifi stack and the Linux TCP/IP, we
> execute all Linux kernel threads by a single Genode thread.

I don't see any reason why you could not run a rump kernel on top of a 
single Genode thread.  Perhaps we are misunderstanding each other?

   - antti

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
rumpkernel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rumpkernel-users

Reply via email to