Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
But that's entirely in guest memory, so it's limited to the amount of RAM that has been allocated to the guest. Exactly. The guest can cause ram_size * nr_ports of additional host memory to be allocated. Not acceptable. OK -- so this is how it adds up: - guest vq - virtio-serial-bus converts iov to buf This is an unbelievably lame piece of code. I doubt it's 'unbelievably lame' just because of the copy. Care to expand? Specifically that we are allocating a host buffer of guest-specified size to hold that copy. There's absolutely no reason to copy the data into a linear buffer. You should just be iterating over the elements of the sglist. OK, but that can be done in a separate patch series. I suspect you'll actually find it easier to fix that first. Otherwise you're going end up having to rewrite your own code. - qemu-char stores the buf in case it wasn't able to send but then, since it's all async, we have: - virtio-serial-bus frees the buf - guest deletes the buf and removes it from the vq So what's left is only the data in qemu-char's buf. Now this can be (buf_size - 1) * nr_ports in the worst case. Add at least another buf_size because you have to allocate the qemu-char buffer before you free the virtio-serial buffer. We can expect that buf_size ~= guest ram size [1], so for practical purposes it may as well be unbounded. Now this only happens when the host chardev is slow or isn't being read from. So it's not really a guest causing a host DoS, but a guest causing itself some harm. No. It causes qemu to allocate and use an arbitrarily large amount of additional ram on the host. This is likely to effect the whole host machine, not just the problematic guest. You can hope the OOM killer happens to pick the right guest, but I wouldn't bet on it. You're right that the allocations happen one after the other, and the freeing happens later, so there is a time when 2 or 3 times the buf_size is needed. However, once qemu_chr_write() returns, there could be just one copy lying around, things are freed elsewhere. One copy (multiplied by the number of ports) is more than enough to cause serious problems. but then that depends on qemu getting async support - separating out the qemu_chr_write() into a separate thread and allowing vcpu and chr io operations to be run simultaneously. You don't need any special async char API or threads. Normal unix write semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient. As mentioned above, the virtio-serial code should be iterating over the sglist. If the host won't accept all the data immediately then just remember how much has been sent, and resume iteration when the unblock hook is called. Yes I've been thinking about this as well. But the problem is some kernel versions spin in the guest code till the buffer is placed back in the vq (signalling it's done using it). This is a problem for the virtio-console (hvc) that does writes with spinlocks held, so allocating new buffers, etc., isn't really -- possible easily. That's a guest bug, plain and simple. I'm pretty sure such guests will still loose after your patch. All you're doing is delaying the inevitable slightly. i.e. if a guest happens to submit another block before the first has been flushed then it will spin in exactly the same way. Paul
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
Sure. My proposal is for qemu_chr_write() to succeed all the time. If the backend can block, and the caller can handle it, it can get a -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, the chardev can call the -writes_unblocked() callback for that caller. Individual callers need not bother about re-submitting partial writes, since the buffering can be done in common code in one place (qemu-char.c). That's only OK if you assume it's OK to buffer all but one byte of the transmit request. Is that a fair assumption to make? No. See below. But that's entirely in guest memory, so it's limited to the amount of RAM that has been allocated to the guest. Exactly. The guest can cause ram_size * nr_ports of additional host memory to be allocated. Not acceptable. OK -- so this is how it adds up: - guest vq - virtio-serial-bus converts iov to buf This is an unbelievably lame piece of code. There's absolutely no reason to copy the data into a linear buffer. You should just be iterating over the elements of the sglist. - qemu-char stores the buf in case it wasn't able to send but then, since it's all async, we have: - virtio-serial-bus frees the buf - guest deletes the buf and removes it from the vq So what's left is only the data in qemu-char's buf. Now this can be (buf_size - 1) * nr_ports in the worst case. Add at least another buf_size because you have to allocate the qemu-char buffer before you free the virtio-serial buffer. We can expect that buf_size ~= guest ram size [1], so for practical purposes it may as well be unbounded. Worst case the guest could submit a buffer consisting of many large overlapping regions, giving a buf_size many times greater then guest ram size. Technically such code isn't even doing anything wrong. We're only reading from guest RAM, so in principle the same memory can be submitted multiple times without causing a race condition. The alternative would be to keep using the guest buffer till all's done, Yes. but then that depends on qemu getting async support - separating out the qemu_chr_write() into a separate thread and allowing vcpu and chr io operations to be run simultaneously. You don't need any special async char API or threads. Normal unix write semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient. As mentioned above, the virtio-serial code should be iterating over the sglist. If the host won't accept all the data immediately then just remember how much has been sent, and resume iteration when the unblock hook is called. Paul [1] This kind of insanity does actually happen in the real world. e.g. loading a 700Mb ramdisk image via the fw_cfg device, or a kernel panic handler that dumps the contents of RAM to a debug channel (i.e. serial port).
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Wed) Dec 08 2010 [12:56:33], Paul Brook wrote: Sure. My proposal is for qemu_chr_write() to succeed all the time. If the backend can block, and the caller can handle it, it can get a -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, the chardev can call the -writes_unblocked() callback for that caller. Individual callers need not bother about re-submitting partial writes, since the buffering can be done in common code in one place (qemu-char.c). That's only OK if you assume it's OK to buffer all but one byte of the transmit request. Is that a fair assumption to make? No. See below. But that's entirely in guest memory, so it's limited to the amount of RAM that has been allocated to the guest. Exactly. The guest can cause ram_size * nr_ports of additional host memory to be allocated. Not acceptable. OK -- so this is how it adds up: - guest vq - virtio-serial-bus converts iov to buf This is an unbelievably lame piece of code. I doubt it's 'unbelievably lame' just because of the copy. Care to expand? There's absolutely no reason to copy the data into a linear buffer. You should just be iterating over the elements of the sglist. OK, but that can be done in a separate patch series. - qemu-char stores the buf in case it wasn't able to send but then, since it's all async, we have: - virtio-serial-bus frees the buf - guest deletes the buf and removes it from the vq So what's left is only the data in qemu-char's buf. Now this can be (buf_size - 1) * nr_ports in the worst case. Add at least another buf_size because you have to allocate the qemu-char buffer before you free the virtio-serial buffer. We can expect that buf_size ~= guest ram size [1], so for practical purposes it may as well be unbounded. Now this only happens when the host chardev is slow or isn't being read from. So it's not really a guest causing a host DoS, but a guest causing itself some harm. You're right that the allocations happen one after the other, and the freeing happens later, so there is a time when 2 or 3 times the buf_size is needed. However, once qemu_chr_write() returns, there could be just one copy lying around, things are freed elsewhere. Worst case the guest could submit a buffer consisting of many large overlapping regions, giving a buf_size many times greater then guest ram size. Technically such code isn't even doing anything wrong. We're only reading from guest RAM, so in principle the same memory can be submitted multiple times without causing a race condition. Interesting. The alternative would be to keep using the guest buffer till all's done, Yes. but then that depends on qemu getting async support - separating out the qemu_chr_write() into a separate thread and allowing vcpu and chr io operations to be run simultaneously. You don't need any special async char API or threads. Normal unix write semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient. As mentioned above, the virtio-serial code should be iterating over the sglist. If the host won't accept all the data immediately then just remember how much has been sent, and resume iteration when the unblock hook is called. Yes I've been thinking about this as well. But the problem is some kernel versions spin in the guest code till the buffer is placed back in the vq (signalling it's done using it). This is a problem for the virtio-console (hvc) that does writes with spinlocks held, so allocating new buffers, etc., isn't really -- possible easily. Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote: when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Except for the fact that no caller of qemu_chr_write() resubmits (or even checks) partial writes. I don't buy this argument. The current implementation of qemu_chr_write never generates transient failures, so they don't need to. And applying this patch won't change the situation. Sure it will. The whole point of the patch is to allow transient failures (i.e. avoid blocking) when writing to char backends. You should expect to have to modify the device code to cope with this. As with the DMA interface added a while ago, I believe it's important to get these APIs right. Quick hacks to support limited use-cases just end up needing a complete rewrite (or even worse multiple concurrent APIs/implementations) once we actually start using them seriously. I'm extremely reluctant to add a new layer of buffering that is not visible to ether the kernel or the device. In practice we still need to be able to split oversize requests, so handling small split requests should be pretty much free. What I proposed in the earlier mail was to buffer only the data that has to be re-submitted in case the caller is capable of stopping further output till the char layer indicates it's free to start again. That's case (b) below. Once data has been transmitted, we have three options: a) Block until the write completes. This makes the whole patch fairly pointless as host and guest block boundaries are unlikely to align. This is what currently happens and will remain so for callers of qemu_chr_write() which don't have a .write_unblocked() pointer assigned in the char dev struct. Obviously if the device doesn't supply an unbocked() hook then the behavior is unchanged. That's trivially uninteresting. I'm talking about devices that do provide the unblocked() hook. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. With virtio-serial, this is what's in use. The buffer is limited to the length of the vq (which is a compile-time constant) and there also is the virtio_serial_throttle_port() call that tells the guest to not send any more data to the host till the char layer indicates it's OK to send more data. No. Firstly you're assuming all users are virtio based. That may be all you care about, but is not acceptable if you want to get this code merged. Secondly, the virtqueue only restricts the number of direct ring buffer entries. It does not restrict the quantity of data each ring entry points to. As a side note, I notice that the virtio-serial-buf code is already allocating buffers and calling iov_to_buf on arbitrary sized requests. This is wrong for the same reason. Don't do it. c) Return a partial write to the guest. The guest already has to handle retries due to EAGAIN, and DMA capable devices already have to handle partial mappings, so this doesn't seem too onerous a requirement. This is not a new concept, it's the same as the unix write(2)/send(2) functions. This isn't possible with the current vq design. You need to fix that then. I'm fairly sure it must be possible as virtio-blk has to handle similar problems. Paul
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Mon) Dec 06 2010 [09:35:10], Paul Brook wrote: On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote: when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Except for the fact that no caller of qemu_chr_write() resubmits (or even checks) partial writes. I don't buy this argument. The current implementation of qemu_chr_write never generates transient failures, so they don't need to. And applying this patch won't change the situation. Sure it will. The whole point of the patch is to allow transient failures (i.e. avoid blocking) when writing to char backends. You should expect to have to modify the device code to cope with this. Looks like we're talking of two different cases. I'm talking here of current code that uses qemu chardevs and that it'll continue working fine with this patchset (ie. changes required only to code that wants -EAGAIN returns). As with the DMA interface added a while ago, I believe it's important to get these APIs right. Quick hacks to support limited use-cases just end up needing a complete rewrite (or even worse multiple concurrent APIs/implementations) once we actually start using them seriously. Sure. My proposal is for qemu_chr_write() to succeed all the time. If the backend can block, and the caller can handle it, it can get a -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, the chardev can call the -writes_unblocked() callback for that caller. Individual callers need not bother about re-submitting partial writes, since the buffering can be done in common code in one place (qemu-char.c). My previous implementation for leaving out the buffering details to individual users of qemu chardevs was OK'ed by you but not Anthony. I'm extremely reluctant to add a new layer of buffering that is not visible to ether the kernel or the device. In practice we still need to be able to split oversize requests, so handling small split requests should be pretty much free. So do you propose to propagate this -EAGAIN error all the way to the guest? That won't work for older virtio guests (for virtio, host and guest changes will be needed). What I proposed in the earlier mail was to buffer only the data that has to be re-submitted in case the caller is capable of stopping further output till the char layer indicates it's free to start again. That's case (b) below. Once data has been transmitted, we have three options: a) Block until the write completes. This makes the whole patch fairly pointless as host and guest block boundaries are unlikely to align. This is what currently happens and will remain so for callers of qemu_chr_write() which don't have a .write_unblocked() pointer assigned in the char dev struct. Obviously if the device doesn't supply an unbocked() hook then the behavior is unchanged. That's trivially uninteresting. I'm talking about devices that do provide the unblocked() hook. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. With virtio-serial, this is what's in use. The buffer is limited to the length of the vq (which is a compile-time constant) and there also is the virtio_serial_throttle_port() call that tells the guest to not send any more data to the host till the char layer indicates it's OK to send more data. No. Firstly you're assuming all users are virtio based. That may be all you care about, but is not acceptable if you want to get this code merged. OK, but it's assumed that once a -EAGAIN is returned, the caller will take appropriate actions to restrict the data sent. Especially, send_all has: if (chr-write_blocked) { /* * We don't handle this situation: the caller should not send * us data while we're blocked. * * We could buffer this data here but that'll only encourage * bad behaviour on part of the callers. * * Also, the data already in fd's buffers isn't easily * migratable. If we want full migration support, all the * data landing here needs to be buffered and on migration, * anything that's unsent needs to be transferred to the * dest. machine (which again isn't a very good way of solving * the problem, as the src may become writable just during * migration and the reader could receive some data twice, * essentially corrupting the data). */ return -1; } Secondly, the virtqueue only restricts
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
As with the DMA interface added a while ago, I believe it's important to get these APIs right. Quick hacks to support limited use-cases just end up needing a complete rewrite (or even worse multiple concurrent APIs/implementations) once we actually start using them seriously. Sure. My proposal is for qemu_chr_write() to succeed all the time. If the backend can block, and the caller can handle it, it can get a -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, the chardev can call the -writes_unblocked() callback for that caller. Individual callers need not bother about re-submitting partial writes, since the buffering can be done in common code in one place (qemu-char.c). That's only OK if you assume it's OK to buffer all but one byte of the transmit request. I'm extremely reluctant to add a new layer of buffering that is not visible to ether the kernel or the device. In practice we still need to be able to split oversize requests, so handling small split requests should be pretty much free. So do you propose to propagate this -EAGAIN error all the way to the guest? That won't work for older virtio guests (for virtio, host and guest changes will be needed). Huh? That doesn't make any sense. The guest is already using an asyncronous submission mechanism. With a virtio device the status of a buffer becomes indeterminate once it has been placed into the queue. Only when it is removed from the queue do we know that it has completed. The device may transfer all or part of that buffer at any time in between. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. With virtio-serial, this is what's in use. The buffer is limited to the length of the vq (which is a compile-time constant) and there also is the virtio_serial_throttle_port() call that tells the guest to not send any more data to the host till the char layer indicates it's OK to send more data. No. Firstly you're assuming all users are virtio based. That may be all you care about, but is not acceptable if you want to get this code merged. OK, but it's assumed that once a -EAGAIN is returned, the caller will take appropriate actions to restrict the data sent. Especially, send_all has: if (chr-write_blocked) { /* * We don't handle this situation: the caller should not send * us data while we're blocked. * * We could buffer this data here but that'll only encourage * bad behaviour on part of the callers. */ return -1; } If you're being draconian about this then do it properly and make this an abort. Otherwise return -EAGAIN. Returning a random error seems like the worst of both worlds. Your code results in spurious guest errors (or lost data) with real indication that this is actually a qemu device emulation bug. Secondly, the virtqueue only restricts the number of direct ring buffer entries. It does not restrict the quantity of data each ring entry points to. But that's entirely in guest memory, so it's limited to the amount of RAM that has been allocated to the guest. Exactly. The guest can cause ram_size * nr_ports of additional host memory to be allocated. Not acceptable. c) Return a partial write to the guest. The guest already has to handle retries due to EAGAIN, and DMA capable devices already have to handle partial mappings, so this doesn't seem too onerous a requirement. This is not a new concept, it's the same as the unix write(2)/send(2) functions. This isn't possible with the current vq design. You need to fix that then. I'm fairly sure it must be possible as virtio-blk has to handle similar problems. This was one of the items that was debated during the lead-up to virtio-serial merge: whether a write() call in the guest means data has been flushed out to the host chardev or if the guest has just passed it on to the host to take care of it. We merged the latter approach. Ensuring that data has actually reached the endpoint (v.s. being in a queue for transmission at the next available point) is a different problem, and probably one better solved by higher level protocols. Char devices are fundamentally stream based devices with no frame boundaries (or alternatively a fixed frame size of 1 byte). Your device only reports progress to the guest in virtqueue-entry sized chunks. However that places no real restrictions on how the data is passed to the host. You can choose to pass a single queue entry to the host in several smaller chunks, or even pass several queue entries to the host in a single request. Paul
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Mon) Dec 06 2010 [13:23:50], Paul Brook wrote: Sure. My proposal is for qemu_chr_write() to succeed all the time. If the backend can block, and the caller can handle it, it can get a -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, the chardev can call the -writes_unblocked() callback for that caller. Individual callers need not bother about re-submitting partial writes, since the buffering can be done in common code in one place (qemu-char.c). That's only OK if you assume it's OK to buffer all but one byte of the transmit request. Is that a fair assumption to make? I'm extremely reluctant to add a new layer of buffering that is not visible to ether the kernel or the device. In practice we still need to be able to split oversize requests, so handling small split requests should be pretty much free. So do you propose to propagate this -EAGAIN error all the way to the guest? That won't work for older virtio guests (for virtio, host and guest changes will be needed). Huh? That doesn't make any sense. The guest is already using an asyncronous submission mechanism. With a virtio device the status of a buffer becomes indeterminate once it has been placed into the queue. Only when it is removed from the queue do we know that it has completed. The device may transfer all or part of that buffer at any time in between. Yes, just making sure this isn't what you're talking about. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. With virtio-serial, this is what's in use. The buffer is limited to the length of the vq (which is a compile-time constant) and there also is the virtio_serial_throttle_port() call that tells the guest to not send any more data to the host till the char layer indicates it's OK to send more data. No. Firstly you're assuming all users are virtio based. That may be all you care about, but is not acceptable if you want to get this code merged. OK, but it's assumed that once a -EAGAIN is returned, the caller will take appropriate actions to restrict the data sent. Especially, send_all has: if (chr-write_blocked) { /* * We don't handle this situation: the caller should not send * us data while we're blocked. * * We could buffer this data here but that'll only encourage * bad behaviour on part of the callers. */ return -1; } If you're being draconian about this then do it properly and make this an abort. Otherwise return -EAGAIN. Returning a random error seems like the worst of both worlds. Your code results in spurious guest errors (or lost data) with real indication that this is actually a qemu device emulation bug. Yeah; abort() is a good idea. (BTW the previous send_all() routine just returned -1 for anything other than -EINTR and -EAGAIN, so I went for consistency.) Secondly, the virtqueue only restricts the number of direct ring buffer entries. It does not restrict the quantity of data each ring entry points to. But that's entirely in guest memory, so it's limited to the amount of RAM that has been allocated to the guest. Exactly. The guest can cause ram_size * nr_ports of additional host memory to be allocated. Not acceptable. OK -- so this is how it adds up: - guest vq - virtio-serial-bus converts iov to buf - qemu-char stores the buf in case it wasn't able to send but then, since it's all async, we have: - virtio-serial-bus frees the buf - guest deletes the buf and removes it from the vq So what's left is only the data in qemu-char's buf. Now this can be (buf_size - 1) * nr_ports in the worst case. The alternative would be to keep using the guest buffer till all's done, but then that depends on qemu getting async support - separating out the qemu_chr_write() into a separate thread and allowing vcpu and chr io operations to be run simultaneously. Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote: when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Except for the fact that no caller of qemu_chr_write() resubmits (or even checks) partial writes. I don't buy this argument. The current implementation of qemu_chr_write never generates transient failures, so they don't need to. And applying this patch won't change the situation. What I proposed in the earlier mail was to buffer only the data that has to be re-submitted in case the caller is capable of stopping further output till the char layer indicates it's free to start again. Once data has been transmitted, we have three options: a) Block until the write completes. This makes the whole patch fairly pointless as host and guest block boundaries are unlikely to align. This is what currently happens and will remain so for callers of qemu_chr_write() which don't have a .write_unblocked() pointer assigned in the char dev struct. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. With virtio-serial, this is what's in use. The buffer is limited to the length of the vq (which is a compile-time constant) and there also is the virtio_serial_throttle_port() call that tells the guest to not send any more data to the host till the char layer indicates it's OK to send more data. c) Return a partial write to the guest. The guest already has to handle retries due to EAGAIN, and DMA capable devices already have to handle partial mappings, so this doesn't seem too onerous a requirement. This is not a new concept, it's the same as the unix write(2)/send(2) functions. This isn't possible with the current vq design. Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Wed) Dec 01 2010 [13:08:26], Paul Brook wrote: On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote: -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) That doesn't happen currently (qemu_chr_write doesn't return a value 0 but len). I had code in there to handle it, but that would change behaviour for current users of qemu_chr_write(), which is a risk. Doesn't that make the code almost completely pointless? Not really -- I did have code for partial writes, but removed it before this submission (had it in previous versions). The (new) do_send loop: len = len1; while (len 0) { ret = write(fd, buf, len); if (ret 0) { if (errno == EAGAIN nonblock) { return -EAGAIN; } if (errno != EINTR errno != EAGAIN) { return -1; } } else if (ret == 0) { break; } else { buf += ret; len -= ret; } } when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Except for the fact that no caller of qemu_chr_write() resubmits (or even checks) partial writes. I think the only way to solve this in a caller-neutral way while keeping the current semantics is to buffer all the data that comes in for qemu_chr_write() and re-submit partial writes in the unblock routine (ie, what I did in virtio-console.c in v7 is to be done in qemu-char.c now). That OK? Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Except for the fact that no caller of qemu_chr_write() resubmits (or even checks) partial writes. I don't buy this argument. The current implementation of qemu_chr_write never generates transient failures, so they don't need to. If any data has been written then you must not return -EAGAIN. Doing so will cause data corruption - the device will retry the transmit later (e.g. when the socket becomes unblocked) and duplicate data will be output. Once data has been transmitted, we have three options: a) Block until the write completes. This makes the whole patch fairly pointless as host and guest block boundaries are unlikely to align. b) Store the data on the side somewhere. Tell the device all data has been sent, and arrange for this data to be flushed before accepting any more data. This is bad because it allows the guest to allocate arbitrarily large[1] buffers on the host. i.e. a fairly easily exploitable DoS attack. c) Return a partial write to the guest. The guest already has to handle retries due to EAGAIN, and DMA capable devices already have to handle partial mappings, so this doesn't seem too onerous a requirement. This is not a new concept, it's the same as the unix write(2)/send(2) functions. Paul [1] At least as large as guest RAM, per port.
[Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
When a chardev indicates it can't accept more data, we tell the virtio-serial code to stop sending us any more data till we tell otherwise. This helps in guests continuing to run normally while the vq keeps getting full and eventually the guest stops queueing more data. As soon as the chardev indicates it can accept more data, start pushing! Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-console.c | 17 - 1 files changed, 16 insertions(+), 1 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index 749ed59..ec85ad5 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -18,13 +18,27 @@ typedef struct VirtConsole { CharDriverState *chr; } VirtConsole; +/* + * Callback function that's called from chardevs when backend becomes + * writable. + */ +static void chr_write_unblocked(void *opaque) +{ +VirtConsole *vcon = opaque; + +virtio_serial_throttle_port(vcon-port, false); +} /* Callback function that's called when the guest sends us data */ static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len) { VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); +int ret; -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } /* Readiness of the guest to accept data on a port */ @@ -62,6 +76,7 @@ static QemuChrHandlers chr_handlers = { .fd_can_read = chr_can_read, .fd_read = chr_read, .fd_event = chr_event, +.fd_write_unblocked = chr_write_unblocked, }; static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev) -- 1.7.3.2
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote: -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) That doesn't happen currently (qemu_chr_write doesn't return a value 0 but len). I had code in there to handle it, but that would change behaviour for current users of qemu_chr_write(), which is a risk. Doesn't that make the code almost completely pointless? Not really -- I did have code for partial writes, but removed it before this submission (had it in previous versions). The (new) do_send loop: len = len1; while (len 0) { ret = write(fd, buf, len); if (ret 0) { if (errno == EAGAIN nonblock) { return -EAGAIN; } if (errno != EINTR errno != EAGAIN) { return -1; } } else if (ret == 0) { break; } else { buf += ret; len -= ret; } } when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. (However I might be completely missing the data that didn't get written as a result of that partial write, so the output gets corrupted. I guess I still need to handle partial writes for that.) Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
-qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) That doesn't happen currently (qemu_chr_write doesn't return a value 0 but len). I had code in there to handle it, but that would change behaviour for current users of qemu_chr_write(), which is a risk. Doesn't that make the code almost completely pointless? Allowing EAGAIN without allowing short writes means that the writes will still block for arbitrarily long periods of time. You're relying on the kernel blocking at the same point the guest happens to split a DMA block. If you're transferring any significant quantities of data the chances of that happening seem pretty slim. Paul
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
/* Callback function that's called when the guest sends us data */ static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len) { VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); +int ret; -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) Paul
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Wed) Dec 01 2010 [11:30:58], Paul Brook wrote: /* Callback function that's called when the guest sends us data */ static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len) { VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); +int ret; -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) That doesn't happen currently (qemu_chr_write doesn't return a value 0 but len). I had code in there to handle it, but that would change behaviour for current users of qemu_chr_write(), which is a risk. Amit
Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote: -qemu_chr_write(vcon-chr, buf, len); +ret = qemu_chr_write(vcon-chr, buf, len); +if (ret == -EAGAIN) { +virtio_serial_throttle_port(port, true); +} } This looks wrong. It will loose data in the case of a partial write (i.e. ret len) That doesn't happen currently (qemu_chr_write doesn't return a value 0 but len). I had code in there to handle it, but that would change behaviour for current users of qemu_chr_write(), which is a risk. Doesn't that make the code almost completely pointless? Not really -- I did have code for partial writes, but removed it before this submission (had it in previous versions). The (new) do_send loop: len = len1; while (len 0) { ret = write(fd, buf, len); if (ret 0) { if (errno == EAGAIN nonblock) { return -EAGAIN; } if (errno != EINTR errno != EAGAIN) { return -1; } } else if (ret == 0) { break; } else { buf += ret; len -= ret; } } when there's a partial write, it tries to do a write again, which will fail with -EAGAIN. Doesn't that cause the first partial chunk to be incorrectly transmitted twice? You may only return EAGAIN if no data was transmitted. Paul