Hi Gleb, I have a couple of questions (CCed to the OSv mailing list) about
your OSv commit 7e38453, maybe you remember something (or be reminded of
something when you look at the commit).
This commit is apparently causing
https://github.com/cloudius-systems/osv/issues/790 so now we're trying to
figure out how to most properly fix it. The problem is that it appears that
the scheduler on one CPU is handed sched::thread objects from another CPU.
These thread objects might live in mmap()ed areas, but we may have delayed
the required TLB flush on the target CPU.
So the questions I find myself asking, perhaps you remember:
1. Do you remember if this commit was an important performance advantage
for a workload, or just an optimistic fix?
2. I'm afraid the scheduler thing might only be the tip of the iceberg of
problems caused by this lazy TLB thing. Could we have, for example (and
this is just a hypothetical example) one thread doing a write() to disk of
some data from an mmap'ed area, and this data is supposed to be read by a
ZFS thread which runs on a different CPU - and because it is labeled a
"system thread", it won't do a TLB flush before reading the mmap'ed area?
Why are we confident that "system threads" never need to read user's
3. This commit 7e38453 starts flush_tlb_all() with setting the
lazy_flush_tlb flag to true, but resets it back to false when it decides to
send an IPI. If the other CPU is right now in the scheduler we can have the
code leave the flag at false (if the out-going thread was an app thread)
and send an IPI which will be delayed - so the scheduler has no way of
knowing it needs to do a TLB flush before accessing the sched::thread.
Couldn't we live the flag at true *in addition* to the IPI? The IPI handler
could then zero it (if not already zero)?
You received this message because you are subscribed to the Google Groups "OSv
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.