On 10/27/2017 12:08 AM, Ondřej Hlavatý wrote:
> I started to experience deadlock while booting HelenOS. It does not
> happen every time, and when I add some debug prints, the deadlock
> disappears completely.
> 
> The issue starts at ps2mouse driver, which adds mouse function from its
> device_add operation. This remote call goes all the way to fun_online,
> in which it is holding the writelock (blocking other drivers) and,
> because the function is exposed, probably waiting inside
> loc_register_tree_function, respectively in loc_service_register.
> 
> Looking at this function, it seems to be very similar to what Jakub
> Jermar describes at:
>       
> http://jakubsuniversalblog.blogspot.cz/2011/09/debugging-file-system-hang-using.html?q=deadlock
> 
> As far as I understand the issue, this shall not be the case - this is
> the sender, not the receiver, and there is no cycle of messages waiting
> for themselves. But after swapping the order of exch release and waiting
> for answer, the deadlock no longer occurs.
> 
> Can someone please confirm, that the order there is correct?

I don't think that changing the mutual ordering of loc_exchange_end()
and async_wait_for() will fix this on its own. See #700 for details.

Btw, when you were testing your fix, did you change the order only in
loc_service_register() or also in other places? I would be actually very
surprised if this changed anything because in all the deadlocks for
which we have some data in #700 (i.e. your .svg and my log files), the
LOC_SERVICE_REGISTER was the second request, not the first. The first
must have been LOC_SERVICE_ADD_TO_CAT. locsrv did not even start
processing the second one.

Jakub



_______________________________________________
HelenOS-devel mailing list
HelenOS-devel@lists.modry.cz
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to