Re: [osv-dev] aarch64: waiter.hh - should t variable be an atomic?

2021-04-06 Thread Waldek Kozaczuk
I think you are right about wait_record and memory barriers (unless there 
is still some hole in our thinking :-)).

So let us get back to one of the observations I made and the related 
question I posed here 
- https://github.com/cloudius-systems/osv/issues/1123#issuecomment-803710337. 

The thread::wake_impl() method has a deliberate "if" logic to validate that 
the current (old) status of the thread is one of the initial states per the 
specified *allowed_initial_states_mask* argument. If it is NOT, the 
wake_impl() simply returns without actually waking a thread (setting 
need_reschedule to true or calling send_wakeup_ipi()). It is interesting 
that neither wake_impl() nor wake() returns a result of the "waking 
process" - the return type is void. I wonder what the rationale behind it 
was. Maybe in most cases, it does not matter whether we really wake it or 
not. But maybe in some, it does and we should be able to know that thread 
was not really woken.

Normally the thread, the wake_impl() is called on (per st argument) would 
be most likely in the *waiting* state. But clearly, it does not have to be 
the case always - the same thread could have just been woken by another 
thread on another CPU for example. 

I have added these debug printouts to see what states thread might be when 
woke_impl() is called:

void thread::wake_impl(detached_state* st, unsigned 
allowed_initial_states_mask)
{
status old_status = status::waiting;
trace_sched_wake(st->t);
while (!st->st.compare_exchange_weak(old_status, status::waking)) {
if (!((1 << unsigned(old_status)) & allowed_initial_states_mask)) {
*if (allowed_initial_states_mask == ((1 << 
unsigned(status::waiting)) | (1 << unsigned(status::sending_lock {*
*if (old_status != status::waiting && old_status != 
status::sending_lock) {*
*   debug_early_u64("! wake_impl: ", (unsigned)old_status);*

*}*
*}*
return;
}
}

Please note I am specifically logging only cases when wake_impl() is called 
from thread::wake_with_from_mutex(Action action). With one CPU, I 
occasionally see that "queued" is logged. With 2 CPUs, besides "queued", I 
also occasionally see "waking" and "running", which is interesting. Most of 
the time even if I see these "waking" and "running" printouts, OSv would 
NOT hang. But sometimes it would right after.

Now if we focus on lockless mutex (core/lfmutex.cc) and two key methods - 
lock() and unlock() - I think we can imagine following scenario:

1. Thread T1 on CPU0 calls lock() on some lock-free mutex M and ends up 
creating a wait_record "waiter" on the stack (could NOT acquire the lock), 
adds "waiter" to the M waitqueue, and finally calls waiter.wait() which 
ends up calling sched::do_wait_until(). Eventually T1 gets scheduled out 
and becomes "waiting".
2. Later, thread T2 on CPU0 calls thread::wake() or other places in code 
not related to lock-free mutex that calls wake_impl() to wake thread T1 
(not sure what such scenario might be).
3a. Thread T1 becomes running again on CPU0, but it checks the condition of 
the waiter which obviously is NOT true (!t) and it goes back to "waiting".
3b. Almost at the same time as T1 is running per 3a on CPU0, thread T3 on 
CPU1 calls unlock() on the same mutex M and pops the waiter from 1) as part 
of "Responsibility Hand-Off" protocol and calls wake() (assuming it will 
simply wake the thread). But in the end, the wake_impl() "if" determines 
that T1 is running so it actually never wakes it and T1 stays stuck like 
this (no?) unless some other thread later simply wakes it again (t was set 
to null in the waiter). But what if such thread never comes to wake it?

Am I misunderstanding something or describing a scenario that should never 
happen? Or maybe it is a symptom of another problem?

If my thinking is correct, then:
1. Shouldn't we check if wake_impl() actually worked all the way in the 
lockfree mutex code and push the waiter back again to its waitqueue if it 
did not? Or is it even more complicated and we should iterate to find a 
waiter it can actually wake?
2. Shouldn't the implementation of thread::do_wake_with(Action action, 
unsigned allowed_initial_states_mask) change and call action ONLY if 
wake_impl() actually woke the thread (meaning the while loop suceeded)? We 
would need another wake_impl() that would take action as an argument.
 
If that is indeed a real problem, why it does not happen on Intel?

On Sunday, April 4, 2021 at 12:32:04 PM UTC-4 Nadav Har'El wrote:

> On Sun, Apr 4, 2021 at 6:00 PM Waldek Kozaczuk  wrote:
>
>>
>>
>> On Thursday, April 1, 2021 at 12:36:19 PM UTC-4 Nadav Har'El wrote:
>>
>>> On Fri, Mar 26, 2021 at 6:10 AM Waldek Kozaczuk  
>>> wrote:
>>>
 As I have been researching a bit the SMP issue described here - 
 https://github.com/cloudius-systems/osv/issues/1123 - I have noticed 
 that the 't' variable in 
 

Re: [osv-dev] Replacing files in built filesystem image

2021-04-06 Thread Fotis Xenakis
Hello David,

Virtio-fs sounds like a natural fit for your situation. Please see this wiki 
page for details: https://github.com/cloudius-systems/osv/wiki/virtio-fs

In short, virtio-fs lets you mount a host directory on the guest. The OSv 
implementation is read-only but that shouldn't be a problem since you are using 
rofs currently.

Feel free to check it out and of course come back if you run into any issues or 
have any feedback!

Fotis

From: osv-dev@googlegroups.com  on behalf of David 
Smith 
Sent: Tuesday, April 6, 2021 4:08:31 PM
To: OSv Development 
Subject: [osv-dev] Replacing files in built filesystem image

I'm reviewing the OSv documentation and scripts to see if I can find an 
existing solution which would allow the addition/replacement/deletion of files 
in an already built filesystem image, without needing the entire OSv build 
environment and performing a rebuild. In my case, the filesystem type used to 
build the .raw image file is ROFS (selected primarily to reduce startup time).

I can see the script being used to generate this filesystem image 
(gen-rofs-img.py), but I haven't yet seen any way in which I can unpack/mount 
an existing filesystem image, make changes to the content and re-pack/unmount 
the image.

Is this a facility that is already available in some form?
If not, any suggestions on the best approach would be welcome.

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/05782ae7-44f0-44d2-a249-aeafdaeef2acn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/VI1PR03MB37735BAD178E2F15A5871718A6769%40VI1PR03MB3773.eurprd03.prod.outlook.com.


[osv-dev] Replacing files in built filesystem image

2021-04-06 Thread David Smith
I'm reviewing the OSv documentation and scripts to see if I can find an 
existing solution which would allow the addition/replacement/deletion of 
files in an already built filesystem image, without needing the entire OSv 
build environment and performing a rebuild. In my case, the filesystem type 
used to build the .raw image file is ROFS (selected primarily to reduce 
startup time).

I can see the script being used to generate this filesystem image 
(gen-rofs-img.py), but I haven't yet seen any way in which I can 
unpack/mount an existing filesystem image, make changes to the content and 
re-pack/unmount the image.

Is this a facility that is already available in some form?
If not, any suggestions on the best approach would be welcome.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/05782ae7-44f0-44d2-a249-aeafdaeef2acn%40googlegroups.com.