On 09/19/2016 11:49 AM, Nadav Har'El wrote:

On Mon, Sep 19, 2016 at 10:52 AM, Gleb Natapov <g...@scylladb.com <mailto:g...@scylladb.com>> wrote:

    > 2. I'm afraid the scheduler thing might only be the tip of the
    iceberg of
    > problems caused by this lazy TLB thing. Could we have, for
    example (and
    > this is just a hypothetical example) one thread doing a write()
    to disk of
    > some data from an mmap'ed area, and this data is supposed to be
    read by a
    > ZFS thread which runs on a different CPU - and because it is
    labeled a
    > "system thread", it won't do a TLB flush before reading the
    mmap'ed area?
    write() copies data into ZFS ARC.

    > Why are we confident that "system threads" never need to read user's
    > mmap'ed data?
    If they do this is a bug as you discovered. They may access mmaped
    memory, but they should do so through their own mappings.

So basically, "system threads" (and the scheduler) are not allowed to read any user memory, because any user memory may be mmap'ed. Whatever user memory is needed in a system thread, must first be *copied* in the original user thread...

Or, the mapping must be pinned (and paged in) for the duration of the access.

In most cases we already need to do this (such is the network API which copies, and such is the filesystem), but I'm worried if we really checked all the cases, and moreover worried that future developers will not be aware of this restriction.

Forcing a copy of user data was very natural in OSs which have a userspace/kernel separation, but in OSv it's not really natural. For example, imagine that in the future we implement zero-copy AIO - won't it read/write directly into user memory, which may be mmap'ed? Couldn't that read/write happen in a different CPU?

Zero-copy happens without copying. You pin the physical page (usually pinning the mapping too in the process), and give the physical address to the device to perform DMA.

    This is the
    design, not something that has to be this way.

The design of what? Of this specific optimization?

    > 3.  This commit 7e38453 starts flush_tlb_all() with setting the
    > lazy_flush_tlb flag to true, but resets it back to false when it
    decides to
    > send an IPI. If the other CPU is right now in the scheduler we
    can have the
    > code leave the flag at false (if the out-going thread was an app
    > and send an IPI which will be delayed - so the scheduler has no
    way of
    > knowing it needs to do a TLB flush before accessing the
    > Couldn't we live the flag at true *in addition* to the IPI? The
    IPI handler
    > could then zero it (if not already zero)?
    If IPI can be delayed why the same bug cannot happen without
    lazy_flush_tlb optimization at all? Thread A mmaps its stack, sends
    flush IPI which is delayed, allocates B's thread struct on the stack,
    cpu 1 tries to access it -> boom.

This is a good point.

Which brings me back again to the point that if we want to correctly support on-stack threads (that's separate question - I already have patches forbidding on-stack thread objects if we want to), we probably need to use this *flag* variable - not the IPI - as a signal to the scheduler that it needs to flush the TLB. So we need to clear this flag only once the target CPU actually flushed the TLB - not as the code currently does, as soon as the source CPU sent it an IPI. So I'm wondering if there was an important reason why the current code needs to zero the flag as soon as we decide to send an IPI ?

You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com <mailto:osv-dev+unsubscr...@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to