On Tuesday, November 28, 2023 at 1:20:06 AM UTC-5 Waldek Kozaczuk wrote:
Hi, It is great to hear from you. Please see my answers below. I hope you also do not mind I reply to the group so others may add something extra or refine/correct my answers as I am not an original developer/designer of OSv. On Fri, Nov 24, 2023 at 8:50 AM Yueyang Pan <yueya...@epfl.ch> wrote: Dear Waldemar Kozaczuk, I am Yueyang Pan from EPFL. Currently I am working on a project about remote memory and trying to develop a prototype based on OSv. I am the guy who raised the questions on the google group several days ago as well. For that question, I made a workaround by adding my own stats class which record the sum and count because I need is the average number. Now I have some further questions. Probably they are a bit dumb for you but I will be very grateful if you could spend a little bit of time to give me some suggestions. The tracepoints use ring buffers of fixed size so eventually, all old tracepoints would be overwritten by new ones. I think you can either increase the size or use the approach used by the script *freq.py* (you need to add the module *httpserver-monitoring-api)*. There is also newly added (experimental though) strace-like functionality (see https://github.com/cloudius-systems/osv/commit/7d7b6d0f1261b87b678c572068e39d482e2103e4). Finally, you may find the comments on this issue relevant - https://github.com/cloudius-systems/osv/issues/1261#issuecomment-1722549524. I am also sure you have come across this wiki page - https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py. Now after my profiling, I found the mutex in global tib_flush_mutex to be hot in my benchmark so I am trying to remove it but it turns to be a bit hard without understanding the thread model of OSv. So I would like to ask whether there is any high-level doc that describes what the scheduling policy of OSv is, how the priority of the threads are decided, whether we can disable preemption or not (the functionality of preempt_lock) and the design of synchronisation primitives (for example why it is not allowed to have preemption disabled inside lockfree::mutex). I am trying to understand by reading the code directly but it can be really helpful if there is some material which describes the design. If indeed your "hot" spot is around tlb_flush_mutex (used by flush_tlb_all()) then I am guessing your program does a lot of mmap/unmap (see *unpopulate* class in core/memory.cc that uses *tlb_gather*). I am not familiar with details of what it tlb_gather exactly does it probably forces TLB (Translation Lookaway Buffer) to flush old virtual/physical memory mapping entries after unmapping. The mmu::*flush_tlb_all*() is actually used in more places. My wild suggestion would be to try to convert the tlb_flush_mutex to spinlock (see include/osv/spinlock.h and core/spinlock.cc). It is a bit controversial idea as OSv prides itself on lock-less structures and almost no spinklocks used (the console initialization is the only place left). But in some places (see https://github.com/cloudius-systems/osv/issues/853#issuecomment-279215964) and https://github.com/cloudius-systems/osv/commit/f8866c0dfd7ca1fcb4b2d9a280946878313a75d3 and https://groups.google.com/g/osv-dev/c/4wMAHCs7_dk/m/1LHdvmoeBwAJ we may benefit from those. Please note the lock-less *sched::thread::wait_until* in the end of the flush_tlb_all would need to be replaced with "busy" wait/sleep. Or instead of spinlock you can use the Nadav's "mutex with spinning" - https://groups.google.com/g/osv-dev/c/4wMAHCs7_dk/m/1LHdvmoeBwAJ - it may be a good fit here. As far as the information on mutexes and scheduling, the best information you can find in the original OSv paper - https://www.usenix.org/conference/atc14/technical-sessions/presentation/kivity. See also https://github.com/cloudius-systems/osv/wiki/Components-of-OSv and many other Wikis. Your preemption question - the lock-free mutex needs to have preemption on - imagine if we have a single CPU and the mutex ends up getting into the wait state to acquire a lock the thread would need to be eventually switched to another one that would release the lock. But if the preemption is off, then the scheduler will keep switching to the same waiting thread for each timer event and our original thread would never acquire the lock. I hope all this helps. Waldek Thanks in advance for any advice you could provide. The questions may be a bit dumb so pardon me if I disturb you. Best Wishes Pan -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/27b99d15-3471-4237-8c62-a5d69fc81f58n%40googlegroups.com.