Re: [osv-dev] Some questions about OSv

'Yueyang Pan' via OSv Development Wed, 29 Nov 2023 06:00:00 -0800

> On 28 Nov 2023, at 21:17, jwkozac...@gmail.com wrote:
> 
> 
> 
> On Tue, Nov 28, 2023 at 3:04 PM Yueyang Pan <yueyang....@epfl.ch 
> <mailto:yueyang....@epfl.ch>> wrote:
>> Hi Nadav and Waldek,
>>      Thanks a lot for very detailed answers from both of you. I have some 
>> updates on this.
>> For the first question, I ended up implementing my own adhoc stat class 
>> where I can measure the total time (Or total count) of a function and 
>> calculate the average. I am still struggling to make the perf work. I got 
>> this error when using perf kvm as shown here 
>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#profiling 
>>      Couldn't record guest kernel [0]'s reference relocation symbol.
>> From perf. Have you ever encountered this problem when you were developing?
> 
> I have never seen it but I will try to dig a bit deeper once I have time.


Thanks a lot in advance!

>> 
>> For the second question, I ended up removing the global tlb_flush_mutex and 
>> introduced linux-like design where you have percpu call_function_data which 
>> contains a percpu array of call_single_data. Each CPU has its own 
>> call_single_queue where the call_single_data is enqueued or dequeued. If you 
>> don’t mind, I can arrange the code a bit and send the patch. Then you can 
>> review it. I am not sure how the developing process works for OSv and I will 
>> appreciate it very much if you can give me some guide.
> Feel free to create PR on github.
> 
> Do you see significant improvement with your change to use percpu 
> call_function_data? OSv has its one percpu structures concept (see 
> include/osv/percpu.hh) so I wonder if you can leverage it.

Yeah, I am having a look right now how to initialise per-cpu variables. The 
previous implementation I had was using a std::array with maximum number of 
CPUs. It is a bit messy so I am taking some time to massage the code. I will 
create a PR once done properly.


> I wonder how this Linux-like solution helps given that the point of the 
> mmu::flush_tlb_all() (where tlb_flush_mutex is used) is to coordinate the 
> flushing of TLB and make sure all CPUs do it so the virtual/physical mapping 
> is in sync across all CPUs. How do you achieve it in your solution? Is 
> potential speed improvement gained from avoiding IPIs which are known to be 
> slow?

I have seen some performance improvement on my own benchmark based on some 
academic prototype which added swap to OSv. I will find some multithreaded 
mmap/unmap benchmarks and share the numbers once done. I think it will also 
have performance improvement because multiple cores don’t need to be serialised 
for the whole mmu::tlb_flush_all when the batch size has been reached. They can 
send IPI at the same time and wait. The receiver side can only perform 1 TLB 
flush and pop all the rest request from the software queue.

For example both A and B want to do the mmu::tlb_flush_all() with the global 
mutex. A has to do first and B has to do later. With linux like approach A and 
B can both send IPIs (or even ignore sending if the core has already received 
IPI but not yet process it) and the receive side only needs to do 
tlb_flush_local once and pop both request A and B from the queue. 

I can see the benefits from multiple places. The number of IPIs sent can be 
reduced. The waiting time for mutex can be eliminated. The total time of 
interrupt handling can be reduced on the receiver side. 


>> 
>> For the scheduling part, I am reading the paper now and the doc. Thanks for 
>> the resources. I need sometime to digest because I found that preempt_lock 
>> matters a lot for performance of my code.
>>     
>>     Best Wishes
>>     Pan
>> 
>> 
>>> On 28 Nov 2023, at 08:29, Nadav Har'El <n...@scylladb.com 
>>> <mailto:n...@scylladb.com>> wrote:
>>> 
>>> On Tue, Nov 28, 2023 at 8:20 AM Waldek Kozaczuk <jwkozac...@gmail.com 
>>> <mailto:jwkozac...@gmail.com>> wrote:
>>>> Hi,
>>>> 
>>>> It is great to hear from you. Please see my answers below. 
>>>> 
>>>> I hope you also do not mind I reply to the group so others may add 
>>>> something extra or refine/correct my answers as I am not an original 
>>>> developer/designer of OSv.
>>>> 
>>>> On Fri, Nov 24, 2023 at 8:50 AM Yueyang Pan <yueyang....@epfl.ch 
>>>> <mailto:yueyang....@epfl.ch>> wrote:
>>>>> Dear Waldemar Kozaczuk,
>>>>>     I am Yueyang Pan from EPFL. Currently I am working on a project about 
>>>>> remote memory and trying to develop a prototype based on OSv. I am the 
>>>>> guy who raised the questions on the google group several days ago as 
>>>>> well. For that question, I made a workaround by adding my own stats class 
>>>>> which record the sum and count because I need is the average number. Now 
>>>>> I have some further questions. Probably they are a bit dumb for you but I 
>>>>> will be very grateful if you could spend a little bit of time to give me 
>>>>> some suggestions.
>>>> 
>>>> The tracepoints use ring buffers of fixed size so eventually, all old 
>>>> tracepoints would be overwritten by new ones. I think you can either 
>>>> increase the size or use the approach used by the script freq.py
>>> 
>>> Exactly. OSv's tracepoints have two modes. One is indeed to save them in a 
>>> ring buffer - so you'll see the last N traced events when you read that 
>>> buffer - but other is a mode that just counts the events. What freq.py does 
>>> is to retrieve the count at one second, then retrieve the count the next 
>>> second - and the subtraction is the average number of this even per second.
>>> 
>>> If you want instead of counting the event, to have a sum of, say, integers 
>>> that come from the event (e.g., sum of packet lengths), we don't have 
>>> support for this at the moment - we only increment the count by 1. It could 
>>> be added as a feature, I guess. But you can always do something ad-hoc like 
>>> maintain a global variable which you add.
>>>  
>>>> (you need to add the module httpserver-monitoring-api). There is also 
>>>> newly added (experimental though) strace-like functionality (see 
>>>> https://github.com/cloudius-systems/osv/commit/7d7b6d0f1261b87b678c572068e39d482e2103e4).
>>>>  Finally, you may find the comments on this issue relevant - 
>>>> https://github.com/cloudius-systems/osv/issues/1261#issuecomment-1722549524.
>>>>  I am also sure you have come across this wiki page - 
>>>> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py.
>>>> 
>>>>>     Now after my profiling, I found the mutex in global tib_flush_mutex 
>>>>> to be hot in my benchmark so I am trying to remove it but it turns to be 
>>>>> a bit hard without understanding the thread model of OSv. So I would like 
>>>>> to ask whether there is any high-level doc that describes what the 
>>>>> scheduling policy of OSv is, how the priority of the threads are decided, 
>>>>> whether we can disable preemption or not (the functionality of 
>>>>> preempt_lock) and the design of synchronisation primitives (for example 
>>>>> why it is not allowed to have preemption disabled inside 
>>>>> lockfree::mutex). I am trying to understand by reading the code directly 
>>>>> but it can be really helpful if there is some material which describes 
>>>>> the design.
>>> 
>>> There are a lot of questions here, and I'm not even sure answering them 
>>> will explain specifically why tlb_flush_mutex is highly contested in your 
>>> workload.
>>> 
>>> Waldek suggested that you read the OSv paper from Usenix, which is a good 
>>> start for understanding the overall OSv architecture.
>>> The scheduling policy and priority (how to decide which thread should run 
>>> next) is described in more detail in this document: 
>>> https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit
>>> 
>>> If you have specific questions, post them here and I'll try to answer. But 
>>> only a few at a time :-) You had a lot of questions above and I can't 
>>> answer them all in one mail :-)
>> 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/9DC916CF-5C41-4C9D-A4B4-D583D264A606%40epfl.ch.

Re: [osv-dev] Some questions about OSv

Reply via email to