Manuel Bouyer <bou...@antioche.eu.org> wrote: > You're the only one seeing this problem AFAIK
Indeed, but I reproduced it on multiple machines. It appears 100% of the time on a busy server, and on a test machine I have a script that reliabily triggers it within a few minutes. Anyway, I am not sure we should disregard bugs on the basis on how often they appear. After all you raised very valid concerns about my changes, but we have no case of that problems acctually occuring yeet. I have been running three servers with the patch for a week and nothing wrong occured. Perhaps this is just because I do not tried shutting down a lot. > Returning EAGAIN without reason is harmfull. There is a reason: not doing so causes a deadlock. I agree the timeout value may be too short, but there should be an upper limit on how long we block any process from trying to enter fstrans_start(). Blocking any filesystem activity for seconds does not seems acceptable to me on a multiuser machine Perhaps we could timeout, unblock the processes that are waiting for us and that we are waiting for, and retry? Something like that? error = 0; while (! state_change_done(mp)) { error = cv_timedwait_sig(&fstrans_count_cv, &fstrans_lock, hz / 4); if (error == EWOULDBLOCK) { cv_broadcast(&fstrans_state_cv); error = 0; } if (error) { new_state = fmi->fmi_state = FSTRANS_NORMAL; break; } } cv_broadcast(&fstrans_state_cv); mutex_exit(&fstrans_lock); -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org