On Tuesday, November 28, 2023 at 1:20:06 AM UTC-5 Waldek Kozaczuk wrote:

Hi,

It is great to hear from you. Please see my answers below. 

I hope you also do not mind I reply to the group so others may add 
something extra or refine/correct my answers as I am not an original 
developer/designer of OSv.

On Fri, Nov 24, 2023 at 8:50 AM Yueyang Pan <yueya...@epfl.ch> wrote:

Dear Waldemar Kozaczuk,
    I am Yueyang Pan from EPFL. Currently I am working on a project about 
remote memory and trying to develop a prototype based on OSv. I am the guy 
who raised the questions on the google group several days ago as well. For 
that question, I made a workaround by adding my own stats class which 
record the sum and count because I need is the average number. Now I have 
some further questions. Probably they are a bit dumb for you but I will be 
very grateful if you could spend a little bit of time to give me some 
suggestions.


The tracepoints use ring buffers of fixed size so eventually, all old 
tracepoints would be overwritten by new ones. I think you can either 
increase the size or use the approach used by the script *freq.py* (you 
need to add the module *httpserver-monitoring-api)*. There is also newly 
added (experimental though) strace-like functionality (see 
https://github.com/cloudius-systems/osv/commit/7d7b6d0f1261b87b678c572068e39d482e2103e4).
 
Finally, you may find the comments on this issue relevant - 
https://github.com/cloudius-systems/osv/issues/1261#issuecomment-1722549524. 
I am also sure you have come across this wiki page - 
https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py.

    Now after my profiling, I found the mutex in global tib_flush_mutex to 
be hot in my benchmark so I am trying to remove it but it turns to be a bit 
hard without understanding the thread model of OSv. So I would like to ask 
whether there is any high-level doc that describes what the scheduling 
policy of OSv is, how the priority of the threads are decided, whether we 
can disable preemption or not (the functionality of preempt_lock) and the 
design of synchronisation primitives (for example why it is not allowed to 
have preemption disabled inside lockfree::mutex). I am trying to understand 
by reading the code directly but it can be really helpful if there is some 
material which describes the design.


If indeed your "hot" spot is around tlb_flush_mutex (used by  
flush_tlb_all()) then I am guessing your program does a lot of mmap/unmap 
(see *unpopulate* class in core/memory.cc that uses *tlb_gather*). I am not 
familiar with details of what it tlb_gather exactly does it probably forces 
TLB (Translation Lookaway Buffer) to flush old virtual/physical memory 
mapping entries after unmapping. The mmu::*flush_tlb_all*() is actually 
used in more places.

My wild suggestion would be to try to convert the tlb_flush_mutex to 
spinlock (see include/osv/spinlock.h and core/spinlock.cc). It is a bit 
controversial idea as OSv prides itself on lock-less structures and almost 
no spinklocks used (the console initialization is the only place left). But 
in some places (see 
https://github.com/cloudius-systems/osv/issues/853#issuecomment-279215964) 
and 
https://github.com/cloudius-systems/osv/commit/f8866c0dfd7ca1fcb4b2d9a280946878313a75d3
 
and https://groups.google.com/g/osv-dev/c/4wMAHCs7_dk/m/1LHdvmoeBwAJ we may 
benefit from those.

Please note the lock-less *sched::thread::wait_until* in the end of the 
flush_tlb_all would need to be replaced with "busy" wait/sleep.

Or instead of spinlock you can use the Nadav's "mutex with spinning" 
- https://groups.google.com/g/osv-dev/c/4wMAHCs7_dk/m/1LHdvmoeBwAJ - it may 
be a good fit here.  


As far as the information on mutexes and scheduling, the best information 
you can find in the original OSv paper - 
https://www.usenix.org/conference/atc14/technical-sessions/presentation/kivity. 
See also https://github.com/cloudius-systems/osv/wiki/Components-of-OSv and 
many other Wikis. 

Your preemption question - the lock-free mutex needs to have preemption on 
- imagine if we have a single CPU and the mutex ends up getting into the 
wait state to acquire a lock the thread would need to be eventually 
switched to another one that would release the lock. But if the preemption 
is off, then the scheduler will keep switching to the same waiting thread 
for each timer event and our original thread would never acquire the lock.

I hope all this helps.

Waldek

    Thanks in advance for any advice you could provide. The questions may 
be a bit dumb so pardon me if I disturb you. 
    Best Wishes
    Pan

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/27b99d15-3471-4237-8c62-a5d69fc81f58n%40googlegroups.com.

Reply via email to