On Mon, Sep 19, 2016 at 11:49:41AM +0300, Nadav Har'El wrote:
> On Mon, Sep 19, 2016 at 10:52 AM, Gleb Natapov <g...@scylladb.com> wrote:
> 
> >
> > > 2. I'm afraid the scheduler thing might only be the tip of the iceberg of
> > > problems caused by this lazy TLB thing. Could we have, for example (and
> > > this is just a hypothetical example) one thread doing a write() to disk
> > of
> > > some data from an mmap'ed area, and this data is supposed to be read by a
> > > ZFS thread which runs on a different CPU - and because it is labeled a
> > > "system thread", it won't do a TLB flush before reading the mmap'ed area?
> > write() copies data into ZFS ARC.
> >
> 
> > > Why are we confident that "system threads" never need to read user's
> > > mmap'ed data?
> > >
> > If they do this is a bug as you discovered. They may access mmaped
> > memory, but they should do so through their own mappings.
> 
> 
> So basically, "system threads" (and the scheduler) are not allowed to read
> any user memory, because any user memory may be mmap'ed.
> Whatever user memory is needed in a system thread, must first be *copied*
> in the original user thread...
> 
No, it may be accessed through global kernel mapping.

> In most cases we already need to do this (such is the network API which
> copies, and such is the filesystem), but I'm worried if we really checked
> all the cases, and moreover worried that future developers will not be
> aware of this restriction.
This is the same restriction that exists in Linux kernel.

> 
> Forcing a copy of user data was very natural in OSs which have a
> userspace/kernel separation, but in OSv it's not really natural. For
> example, imagine that in the future we implement zero-copy AIO - won't it
> read/write directly into user memory, which may be mmap'ed? Couldn't that
> read/write happen in a different CPU?
> 
OSs with userspace/kernel separation implement zero-copy AIO without
much problem. The memory is all there accessible through kernel mapping.

> 
> 
> > This is the
> > design, not something that has to be this way.
> >
> 
> The design of what? Of this specific optimization?
> 
The design of the kernel that allows this optimization to exist.

> 
> >
> > > 3.  This commit 7e38453 starts flush_tlb_all() with setting the
> > > lazy_flush_tlb flag to true, but resets it back to false when it decides
> > to
> > > send an IPI. If the other CPU is right now in the scheduler we can have
> > the
> > > code leave the flag at false (if the out-going thread was an app thread)
> > > and send an IPI which will be delayed - so the scheduler has no way of
> > > knowing it needs to do a TLB flush before accessing the sched::thread.
> > > Couldn't we live the flag at true *in addition* to the IPI? The IPI
> > handler
> > > could then zero it (if not already zero)?
> > >
> > If IPI can be delayed why the same bug cannot happen without
> > lazy_flush_tlb optimization at all? Thread A mmaps its stack, sends
> > flush IPI which is delayed, allocates B's thread struct on the stack,
> > cpu 1 tries to access it -> boom.
> >
> 
> This is a good point.
I am not sure this can happen though. How cpu 1 will know B exists
before getting the IPI?

> 
> Which brings me back again to the point that if we want to correctly
> support on-stack threads (that's separate question - I already have patches
> forbidding on-stack thread objects if we want to), we probably need to use
> this *flag* variable - not the IPI - as a signal to the scheduler that it
> needs to flush the TLB.
> So we need to clear this flag only once the target CPU actually flushed the
> TLB - not as the code currently does, as soon as the source CPU sent it an
> IPI. So I'm wondering if there was an important reason why the current code
> needs to zero the flag as soon as we decide to send an IPI ?
> 
No, this was to avoid redundant local tlb flush on next reschedule, but
I do not think it is a good idea to reuse this flag to avoid IPI race
which I am not sure can happen at all.

--
                        Gleb.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to