Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
> On 15 Feb 2018, at 04:42, Wouter Verhelst wrote: > > > We already set the SO_KEEPALIVE socket option (at least nbd-server does; > don't know about qemu) to make the kernel send out TCP-level keepalive > probes. This happens only after two hours (by default), but it's > something you can configure on your system if you need it to be lower. +1 for just using SO_KEEPALIVE. I think I even submitted some (untested and thus unmerged) patches for this. -- Alex Bligh
Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Hi Eric, On Wed, Feb 14, 2018 at 09:11:02AM -0600, Eric Blake wrote: [NBD and keepalive] > This is more food for thought on whether it even makes sense for NBD to > worry about assisting in keepalive matters, or whether it would just be > bloating the protocol. I'm currently leaning towards the latter. I don't think it makes (much) sense to run NBD over an unreliable transport. It uses TCP specifically to not have to worry about that, under the expectation that it won't break except in unusual circumstances; if you break that expectation, I think it's not unfair to say "well, then you get to keep both pieces". We already set the SO_KEEPALIVE socket option (at least nbd-server does; don't know about qemu) to make the kernel send out TCP-level keepalive probes. This happens only after two hours (by default), but it's something you can configure on your system if you need it to be lower. Having said that, I can always be convinced otherwise by good arguments :-) -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
[adding nbd list] On 02/14/2018 03:45 AM, Alexandre DERUMIER wrote: Sorry, I just find that the problem is in our proxmox implementation, as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity. So, not a qemu bug. Good to hear. Still, it makes me wonder if the NBD protocol itself should have some sort of a keepalive mechanism, maybe a new NBD_CMD_PING that can be used as a no-op command to keep the line alive if there is no other command to send for a while? A client can always use a throwaway NBD_CMD_READ to keep the line alive, but that has more overhead; conversely, an extension is only useful if both client and server can negotiate to use it, which means that clients still have to be prepared for alternative fallbacks if they want to keep the line alive. And we still don't have support for the server ever sending unsolicited messages (other than perhaps a structured reply where the server sends periodic reply chunks but never sends a final chunk - but still something that the guest initiates the sequence of server replies), so while the guest can keep the line to the server up, having the server keep the line open to the guest is a bit harder. This is more food for thought on whether it even makes sense for NBD to worry about assisting in keepalive matters, or whether it would just be bloating the protocol. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Sorry, I just find that the problem is in our proxmox implementation, as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity. So, not a qemu bug. Regards, Alexandre - Mail original - De: "aderumier" À: "qemu-devel" Envoyé: Mercredi 14 Février 2018 09:54:21 Objet: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Hi, I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel. step to reproduce, with 2 disks: 1) launch mirroring job of first disk to remote target nbd.(to qemu running target) 2) wait until is reach ready = 1 , do not complete 3) launch mirroring job of second disk to remote target nbd(to same qemu running target) -> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming. then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error. Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually. Another similar bug report on proxmox bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=1664 Maybe related to this : https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html ? I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11. Best Regards, Alexandre
[Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Hi, I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel. step to reproduce, with 2 disks: 1) launch mirroring job of first disk to remote target nbd.(to qemu running target) 2) wait until is reach ready = 1 , do not complete 3) launch mirroring job of second disk to remote target nbd(to same qemu running target) -> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming. then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error. Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually. Another similar bug report on proxmox bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=1664 Maybe related to this : https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html ? I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11. Best Regards, Alexandre