16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote:
OK, that's a big rewrite of the logic.Pre-patch we have an always running coroutine - connection_co. It does reply receiving and reconnecting. And it leads to a lot of difficult and unobvious code around drained sections and context switch. We also abuse bs->in_flight counter which is increased for connection_co and temporary decreased in points where we want to allow drained section to begin. One of these place is in another file: in nbd_read_eof() in nbd/client.c. We also cancel reconnect and requests waiting for reconnect on drained begin which is not correct. Let's finally drop this always running coroutine and go another way: 1. reconnect_attempt() goes to nbd_co_send_request and called under send_mutex 2. We do receive headers in request coroutine. But we also should dispatch replies for another pending requests. So, nbd_connection_entry() is turned into nbd_receive_replies(), which does reply dispatching until it receive another request headers, and returns when it receive the requested header. 3. All old staff around drained sections and context switch is dropped. Signed-off-by: Vladimir Sementsov-Ogievskiy<[email protected]>
Please consider this last patch as RFC for now: 1. It is complicated, and doesn't have good documentation. Please look through and ask everything that is not obvious, I'll explain. Don't waste your time trying to understand what is not clean. 2. I also failed to image, how to split the patch into smaller simple patches.. Ideas are welcome. 3. It actually reverts what was done in commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace Author: Vladimir Sementsov-Ogievskiy <[email protected]> Date: Thu Sep 3 22:02:58 2020 +0300 block/nbd: fix drain dead-lock because of nbd reconnect-delay and I didn't check yet, does this dead-lock still here or not. Even if it still here I believe that nbd driver is a wrong place to workaround this bug, but I should check it first at least. -- Best regards, Vladimir
