Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-09 Thread Paul Brook
 But that's entirely in guest memory, so it's limited to the amount
 of RAM that has been allocated to the guest.

Exactly. The guest can cause ram_size * nr_ports of additional host
memory to be allocated.  Not acceptable.
   
   OK -- so this is how it adds up:
   
   - guest vq
   - virtio-serial-bus converts iov to buf
  
  This is an unbelievably lame piece of code.
 
 I doubt it's 'unbelievably lame' just because of the copy.  Care to
 expand?

Specifically that we are allocating a host buffer of guest-specified size to 
hold that copy.
 
  There's absolutely no reason to
  copy the data into a linear buffer. You should just be iterating over the
  elements of the sglist.
 
 OK, but that can be done in a separate patch series.

I suspect you'll actually find it easier to fix that first. Otherwise you're 
going end up having to rewrite your own code.

   - qemu-char stores the buf in case it wasn't able to send
   
   but then, since it's all async, we have:
   
   - virtio-serial-bus frees the buf
   - guest deletes the buf and removes it from the vq
   
   So what's left is only the data in qemu-char's buf.  Now this can be
   (buf_size - 1) * nr_ports in the worst case.
  
  Add at least another buf_size because you have to allocate the qemu-char
  buffer before you free the virtio-serial buffer. We can expect that
  buf_size ~= guest ram size [1], so for practical purposes it may as well
  be unbounded.
 
 Now this only happens when the host chardev is slow or isn't being read
 from.  So it's not really a guest causing a host DoS, but a guest
 causing itself some harm.  

No. It causes qemu to allocate and use an arbitrarily large amount of 
additional ram on the host. This is likely to effect the whole host machine, 
not just the problematic guest.  You can hope the OOM killer happens to pick 
the right guest, but I wouldn't bet on it.

 You're right that the allocations happen one
 after the other, and the freeing happens later, so there is a time when
 2 or 3 times the buf_size is needed.
 
 However, once qemu_chr_write() returns, there could be just one copy
 lying around, things are freed elsewhere.

One copy (multiplied by the number of ports) is more than enough to cause 
serious problems.

   but then that depends on qemu getting async support - separating out
   the qemu_chr_write() into a separate thread and allowing vcpu and chr
   io operations to be run simultaneously.
  
  You don't need any special async char API or threads.  Normal unix write
  semantics (i.e. short writes and EAGAIN) plus the unblock hook are
  sufficient. As mentioned above, the virtio-serial code should be
  iterating over the sglist.  If the host won't accept all the data
  immediately then just remember how much has been sent, and resume
  iteration when the unblock hook is called.
 
 Yes I've been thinking about this as well.  But the problem is some
 kernel versions spin in the guest code till the buffer is placed back
 in the vq (signalling it's done using it).  This is a problem for the
 virtio-console (hvc) that does writes with spinlocks held, so allocating
 new buffers, etc., isn't really -- possible easily.

That's a guest bug, plain and simple.
I'm pretty sure such guests will still loose after your patch. All you're 
doing is delaying the inevitable slightly. i.e. if a guest happens to submit 
another block before the first has been flushed then it will spin in exactly 
the same way.

Paul



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-08 Thread Paul Brook
   Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
   the backend can block, and the caller can handle it, it can get a
   -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
   the chardev can call the -writes_unblocked() callback for that caller.
   Individual callers need not bother about re-submitting partial writes,
   since the buffering can be done in common code in one place
   (qemu-char.c).
  
  That's only OK if you assume it's OK to buffer all but one byte of the
  transmit request.
 
 Is that a fair assumption to make?

No. See below.

   But that's entirely in guest memory, so it's limited to the amount of
   RAM that has been allocated to the guest.
  
  Exactly. The guest can cause ram_size * nr_ports of additional host
  memory to be allocated.  Not acceptable.
 
 OK -- so this is how it adds up:
 
 - guest vq
 - virtio-serial-bus converts iov to buf

This is an unbelievably lame piece of code. There's absolutely no reason to 
copy the data into a linear buffer. You should just be iterating over the 
elements of the sglist.

 - qemu-char stores the buf in case it wasn't able to send

 but then, since it's all async, we have:
 
 - virtio-serial-bus frees the buf
 - guest deletes the buf and removes it from the vq

 So what's left is only the data in qemu-char's buf.  Now this can be
 (buf_size - 1) * nr_ports in the worst case.

Add at least another buf_size because you have to allocate the qemu-char 
buffer before you free the virtio-serial buffer. We can expect that
buf_size ~= guest ram size [1], so for practical purposes it may as well be 
unbounded.

Worst case the guest could submit a buffer consisting of many large 
overlapping regions, giving a buf_size many times greater then guest ram size.  
Technically such code isn't even doing anything wrong. We're only reading from 
guest RAM, so in principle the same memory can be submitted multiple times 
without causing a race condition.

 The alternative would be to keep using the guest buffer till all's done,

Yes.

 but then that depends on qemu getting async support - separating out the
 qemu_chr_write() into a separate thread and allowing vcpu and chr io
 operations to be run simultaneously.

You don't need any special async char API or threads.  Normal unix write 
semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient.
As mentioned above, the virtio-serial code should be iterating over the 
sglist.  If the host won't accept all the data immediately then just remember 
how much has been sent, and resume iteration when the unblock hook is called.

Paul

[1] This kind of insanity does actually happen in the real world.  e.g. 
loading a 700Mb ramdisk image via the fw_cfg device, or a kernel panic handler 
that dumps the contents of RAM to a debug channel (i.e. serial port).



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-08 Thread Amit Shah
On (Wed) Dec 08 2010 [12:56:33], Paul Brook wrote:
Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
the backend can block, and the caller can handle it, it can get a
-EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
the chardev can call the -writes_unblocked() callback for that caller.
Individual callers need not bother about re-submitting partial writes,
since the buffering can be done in common code in one place
(qemu-char.c).
   
   That's only OK if you assume it's OK to buffer all but one byte of the
   transmit request.
  
  Is that a fair assumption to make?
 
 No. See below.
 
But that's entirely in guest memory, so it's limited to the amount of
RAM that has been allocated to the guest.
   
   Exactly. The guest can cause ram_size * nr_ports of additional host
   memory to be allocated.  Not acceptable.
  
  OK -- so this is how it adds up:
  
  - guest vq
  - virtio-serial-bus converts iov to buf
 
 This is an unbelievably lame piece of code.

I doubt it's 'unbelievably lame' just because of the copy.  Care to
expand?

 There's absolutely no reason to 
 copy the data into a linear buffer. You should just be iterating over the 
 elements of the sglist.

OK, but that can be done in a separate patch series.

  - qemu-char stores the buf in case it wasn't able to send
 
  but then, since it's all async, we have:
  
  - virtio-serial-bus frees the buf
  - guest deletes the buf and removes it from the vq
 
  So what's left is only the data in qemu-char's buf.  Now this can be
  (buf_size - 1) * nr_ports in the worst case.
 
 Add at least another buf_size because you have to allocate the qemu-char 
 buffer before you free the virtio-serial buffer. We can expect that
 buf_size ~= guest ram size [1], so for practical purposes it may as well be 
 unbounded.

Now this only happens when the host chardev is slow or isn't being read
from.  So it's not really a guest causing a host DoS, but a guest
causing itself some harm.  You're right that the allocations happen one
after the other, and the freeing happens later, so there is a time when
2 or 3 times the buf_size is needed.

However, once qemu_chr_write() returns, there could be just one copy
lying around, things are freed elsewhere.

 Worst case the guest could submit a buffer consisting of many large 
 overlapping regions, giving a buf_size many times greater then guest ram 
 size.  
 Technically such code isn't even doing anything wrong. We're only reading 
 from 
 guest RAM, so in principle the same memory can be submitted multiple times 
 without causing a race condition.

Interesting.

  The alternative would be to keep using the guest buffer till all's done,
 
 Yes.
 
  but then that depends on qemu getting async support - separating out the
  qemu_chr_write() into a separate thread and allowing vcpu and chr io
  operations to be run simultaneously.
 
 You don't need any special async char API or threads.  Normal unix write 
 semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient.
 As mentioned above, the virtio-serial code should be iterating over the 
 sglist.  If the host won't accept all the data immediately then just remember 
 how much has been sent, and resume iteration when the unblock hook is called.

Yes I've been thinking about this as well.  But the problem is some
kernel versions spin in the guest code till the buffer is placed back
in the vq (signalling it's done using it).  This is a problem for the
virtio-console (hvc) that does writes with spinlocks held, so allocating
new buffers, etc., isn't really -- possible easily.

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-06 Thread Paul Brook
 On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
 when there's a partial write, it tries to do a write again, which
 will fail with -EAGAIN.

Doesn't that cause the first partial chunk to be incorrectly
transmitted twice? You may only return EAGAIN if no data was
transmitted.
   
   Except for the fact that no caller of qemu_chr_write() resubmits (or
   even checks) partial writes.
  
  I don't buy this argument. The current implementation of qemu_chr_write
  never generates transient failures, so they don't need to.
 
 And applying this patch won't change the situation.

Sure it will. The whole point of the patch is to allow transient failures 
(i.e. avoid blocking) when writing to char backends.  You should expect to 
have to modify the device code to cope with this.

As with the DMA interface added a while ago, I believe it's important to get 
these APIs right.  Quick hacks to support limited use-cases just end up 
needing a complete rewrite (or even worse multiple concurrent 
APIs/implementations) once we actually start using them seriously.

I'm extremely reluctant to add a new layer of buffering that is not visible to 
ether the kernel or the device.  In practice we still need to be able to split 
oversize requests, so handling small split requests should be pretty much 
free.

 What I proposed in the earlier mail was to buffer only the data that has
 to be re-submitted in case the caller is capable of stopping further
 output till the char layer indicates it's free to start again.

That's case (b) below.

  Once data has been transmitted, we have three options:
  
  a) Block until the write completes. This makes the whole patch fairly
  pointless as host and guest block boundaries are unlikely to align.
 
 This is what currently happens and will remain so for callers of
 qemu_chr_write() which don't have a .write_unblocked() pointer assigned
 in the char dev struct.

Obviously if the device doesn't supply an unbocked() hook then the behavior is 
unchanged.  That's trivially uninteresting.  I'm talking about devices that do 
provide the unblocked() hook.

  b) Store the data on the side somewhere. Tell the device all data has
  been sent, and arrange for this data to be flushed before accepting any
  more data. This is bad because it allows the guest to allocate
  arbitrarily large[1] buffers on the host. i.e. a fairly easily
  exploitable DoS attack.
 
 With virtio-serial, this is what's in use.  The buffer is limited to the
 length of the vq (which is a compile-time constant) and there also is
 the virtio_serial_throttle_port() call that tells the guest to not send
 any more data to the host till the char layer indicates it's OK to send
 more data.

No.

Firstly you're assuming all users are virtio based. That may be all you care 
about, but is not acceptable if you want to get this code merged.

Secondly, the virtqueue only restricts the number of direct ring buffer 
entries. It does not restrict the quantity of data each ring entry points to.

As a side note, I notice that the virtio-serial-buf code is already allocating 
buffers and calling iov_to_buf on arbitrary sized requests. This is wrong for 
the same reason. Don't do it.

  c) Return a partial write to the guest. The guest already has to handle
  retries due to EAGAIN, and DMA capable devices already have to handle
  partial mappings, so this doesn't seem too onerous a requirement. This
  is not a new concept, it's the same as the unix write(2)/send(2)
  functions.
 
 This isn't possible with the current vq design.

You need to fix that then.  I'm fairly sure it must be possible as virtio-blk 
has to handle similar problems.

Paul



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-06 Thread Amit Shah
On (Mon) Dec 06 2010 [09:35:10], Paul Brook wrote:
  On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
  when there's a partial write, it tries to do a write again, which
  will fail with -EAGAIN.
 
 Doesn't that cause the first partial chunk to be incorrectly
 transmitted twice? You may only return EAGAIN if no data was
 transmitted.

Except for the fact that no caller of qemu_chr_write() resubmits (or
even checks) partial writes.
   
   I don't buy this argument. The current implementation of qemu_chr_write
   never generates transient failures, so they don't need to.
  
  And applying this patch won't change the situation.
 
 Sure it will. The whole point of the patch is to allow transient failures 
 (i.e. avoid blocking) when writing to char backends.  You should expect to 
 have to modify the device code to cope with this.

Looks like we're talking of two different cases.  I'm talking here of
current code that uses qemu chardevs and that it'll continue working
fine with this patchset (ie. changes required only to code that wants
-EAGAIN returns).

 As with the DMA interface added a while ago, I believe it's important to get 
 these APIs right.  Quick hacks to support limited use-cases just end up 
 needing a complete rewrite (or even worse multiple concurrent 
 APIs/implementations) once we actually start using them seriously.

Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
the backend can block, and the caller can handle it, it can get a
-EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
the chardev can call the -writes_unblocked() callback for that caller.
Individual callers need not bother about re-submitting partial writes,
since the buffering can be done in common code in one place
(qemu-char.c).

My previous implementation for leaving out the buffering details to
individual users of qemu chardevs was OK'ed by you but not Anthony.

 I'm extremely reluctant to add a new layer of buffering that is not visible 
 to 
 ether the kernel or the device.  In practice we still need to be able to 
 split 
 oversize requests, so handling small split requests should be pretty much 
 free.

So do you propose to propagate this -EAGAIN error all the way to the
guest?  That won't work for older virtio guests (for virtio, host and
guest changes will be needed).

  What I proposed in the earlier mail was to buffer only the data that has
  to be re-submitted in case the caller is capable of stopping further
  output till the char layer indicates it's free to start again.
 
 That's case (b) below.
 
   Once data has been transmitted, we have three options:
   
   a) Block until the write completes. This makes the whole patch fairly
   pointless as host and guest block boundaries are unlikely to align.
  
  This is what currently happens and will remain so for callers of
  qemu_chr_write() which don't have a .write_unblocked() pointer assigned
  in the char dev struct.
 
 Obviously if the device doesn't supply an unbocked() hook then the behavior 
 is 
 unchanged.  That's trivially uninteresting.  I'm talking about devices that 
 do 
 provide the unblocked() hook.
 
   b) Store the data on the side somewhere. Tell the device all data has
   been sent, and arrange for this data to be flushed before accepting any
   more data. This is bad because it allows the guest to allocate
   arbitrarily large[1] buffers on the host. i.e. a fairly easily
   exploitable DoS attack.
  
  With virtio-serial, this is what's in use.  The buffer is limited to the
  length of the vq (which is a compile-time constant) and there also is
  the virtio_serial_throttle_port() call that tells the guest to not send
  any more data to the host till the char layer indicates it's OK to send
  more data.
 
 No.
 
 Firstly you're assuming all users are virtio based. That may be all you care 
 about, but is not acceptable if you want to get this code merged.

OK, but it's assumed that once a -EAGAIN is returned, the caller will
take appropriate actions to restrict the data sent.  Especially,
send_all has:

if (chr-write_blocked) {
/*
 * We don't handle this situation: the caller should not send
 * us data while we're blocked.
 *
 * We could buffer this data here but that'll only encourage
 * bad behaviour on part of the callers.
 *
 * Also, the data already in fd's buffers isn't easily
 * migratable.  If we want full migration support, all the
 * data landing here needs to be buffered and on migration,
 * anything that's unsent needs to be transferred to the
 * dest. machine (which again isn't a very good way of solving
 * the problem, as the src may become writable just during
 * migration and the reader could receive some data twice,
 * essentially corrupting the data).
 */
return -1;
}

 Secondly, the virtqueue only restricts 

Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-06 Thread Paul Brook
  As with the DMA interface added a while ago, I believe it's important to
  get these APIs right.  Quick hacks to support limited use-cases just end
  up needing a complete rewrite (or even worse multiple concurrent
  APIs/implementations) once we actually start using them seriously.
 
 Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
 the backend can block, and the caller can handle it, it can get a
 -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
 the chardev can call the -writes_unblocked() callback for that caller.
 Individual callers need not bother about re-submitting partial writes,
 since the buffering can be done in common code in one place
 (qemu-char.c).

That's only OK if you assume it's OK to buffer all but one byte of the 
transmit request.

  I'm extremely reluctant to add a new layer of buffering that is not
  visible to ether the kernel or the device.  In practice we still need to
  be able to split oversize requests, so handling small split requests
  should be pretty much free.
 
 So do you propose to propagate this -EAGAIN error all the way to the
 guest?  That won't work for older virtio guests (for virtio, host and
 guest changes will be needed).

Huh? That doesn't make any sense. The guest is already using an asyncronous 
submission mechanism.  
With a virtio device the status of a buffer becomes indeterminate once it has 
been placed into the queue. Only when it is removed from the queue do we know 
that it has completed.  The device may transfer all or part of that buffer at 
any time in between.
 
b) Store the data on the side somewhere. Tell the device all data has
been sent, and arrange for this data to be flushed before accepting
any more data. This is bad because it allows the guest to allocate
arbitrarily large[1] buffers on the host. i.e. a fairly easily
exploitable DoS attack.
   
   With virtio-serial, this is what's in use.  The buffer is limited to
   the length of the vq (which is a compile-time constant) and there also
   is the virtio_serial_throttle_port() call that tells the guest to not
   send any more data to the host till the char layer indicates it's OK
   to send more data.
  
  No.
  
  Firstly you're assuming all users are virtio based. That may be all you
  care about, but is not acceptable if you want to get this code merged.
 
 OK, but it's assumed that once a -EAGAIN is returned, the caller will
 take appropriate actions to restrict the data sent.  Especially,
 send_all has:
 
 if (chr-write_blocked) {
 /*
  * We don't handle this situation: the caller should not send
  * us data while we're blocked.
  *
  * We could buffer this data here but that'll only encourage
  * bad behaviour on part of the callers.

  */
 return -1;
 }

If you're being draconian about this then do it properly and make this an 
abort. Otherwise return -EAGAIN. Returning a random error seems like the worst 
of both worlds.  Your code results in spurious guest errors (or lost data) 
with real indication that this is actually a qemu device emulation bug.
 
  Secondly, the virtqueue only restricts the number of direct ring buffer
  entries. It does not restrict the quantity of data each ring entry points
  to.
 
 But that's entirely in guest memory, so it's limited to the amount of
 RAM that has been allocated to the guest.

Exactly. The guest can cause ram_size * nr_ports of additional host memory to 
be allocated.  Not acceptable. 

c) Return a partial write to the guest. The guest already has to
handle retries due to EAGAIN, and DMA capable devices already have
to handle partial mappings, so this doesn't seem too onerous a
requirement. This is not a new concept, it's the same as the unix
write(2)/send(2) functions.
   
   This isn't possible with the current vq design.
  
  You need to fix that then.  I'm fairly sure it must be possible as
  virtio-blk has to handle similar problems.
 
 This was one of the items that was debated during the lead-up to
 virtio-serial merge:  whether a write() call in the guest means data has
 been flushed out to the host chardev or if the guest has just passed it
 on to the host to take care of it.  We merged the latter approach.

Ensuring that data has actually reached the endpoint (v.s. being in a queue 
for transmission at the next available point) is a different problem, and 
probably one better solved by higher level protocols.


Char devices are fundamentally stream based devices with no frame boundaries 
(or alternatively a fixed frame size of 1 byte).

Your device only reports progress to the guest in virtqueue-entry sized 
chunks. However that places no real restrictions on how the data is passed to 
the host. You can choose to pass a single queue entry to the host in several 
smaller chunks, or even pass several queue entries to the host in a single 
request.

Paul



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-06 Thread Amit Shah
On (Mon) Dec 06 2010 [13:23:50], Paul Brook wrote:
  Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
  the backend can block, and the caller can handle it, it can get a
  -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
  the chardev can call the -writes_unblocked() callback for that caller.
  Individual callers need not bother about re-submitting partial writes,
  since the buffering can be done in common code in one place
  (qemu-char.c).
 
 That's only OK if you assume it's OK to buffer all but one byte of the 
 transmit request.

Is that a fair assumption to make?

   I'm extremely reluctant to add a new layer of buffering that is not
   visible to ether the kernel or the device.  In practice we still need to
   be able to split oversize requests, so handling small split requests
   should be pretty much free.
  
  So do you propose to propagate this -EAGAIN error all the way to the
  guest?  That won't work for older virtio guests (for virtio, host and
  guest changes will be needed).
 
 Huh? That doesn't make any sense. The guest is already using an asyncronous 
 submission mechanism.  
 With a virtio device the status of a buffer becomes indeterminate once it has 
 been placed into the queue. Only when it is removed from the queue do we know 
 that it has completed.  The device may transfer all or part of that buffer at 
 any time in between.

Yes, just making sure this isn't what you're talking about.

 b) Store the data on the side somewhere. Tell the device all data has
 been sent, and arrange for this data to be flushed before accepting
 any more data. This is bad because it allows the guest to allocate
 arbitrarily large[1] buffers on the host. i.e. a fairly easily
 exploitable DoS attack.

With virtio-serial, this is what's in use.  The buffer is limited to
the length of the vq (which is a compile-time constant) and there also
is the virtio_serial_throttle_port() call that tells the guest to not
send any more data to the host till the char layer indicates it's OK
to send more data.
   
   No.
   
   Firstly you're assuming all users are virtio based. That may be all you
   care about, but is not acceptable if you want to get this code merged.
  
  OK, but it's assumed that once a -EAGAIN is returned, the caller will
  take appropriate actions to restrict the data sent.  Especially,
  send_all has:
  
  if (chr-write_blocked) {
  /*
   * We don't handle this situation: the caller should not send
   * us data while we're blocked.
   *
   * We could buffer this data here but that'll only encourage
   * bad behaviour on part of the callers.
 
   */
  return -1;
  }
 
 If you're being draconian about this then do it properly and make this an 
 abort. Otherwise return -EAGAIN. Returning a random error seems like the 
 worst 
 of both worlds.  Your code results in spurious guest errors (or lost data) 
 with real indication that this is actually a qemu device emulation bug.

Yeah; abort() is a good idea.

(BTW the previous send_all() routine just returned -1 for anything other
than -EINTR and -EAGAIN, so I went for consistency.)

   Secondly, the virtqueue only restricts the number of direct ring buffer
   entries. It does not restrict the quantity of data each ring entry points
   to.
  
  But that's entirely in guest memory, so it's limited to the amount of
  RAM that has been allocated to the guest.
 
 Exactly. The guest can cause ram_size * nr_ports of additional host memory to 
 be allocated.  Not acceptable. 

OK -- so this is how it adds up:

- guest vq
- virtio-serial-bus converts iov to buf
- qemu-char stores the buf in case it wasn't able to send

but then, since it's all async, we have:

- virtio-serial-bus frees the buf
- guest deletes the buf and removes it from the vq

So what's left is only the data in qemu-char's buf.  Now this can be
(buf_size - 1) * nr_ports in the worst case.

The alternative would be to keep using the guest buffer till all's done,
but then that depends on qemu getting async support - separating out the
qemu_chr_write() into a separate thread and allowing vcpu and chr io
operations to be run simultaneously.

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-05 Thread Amit Shah
On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
when there's a partial write, it tries to do a write again, which will
fail with -EAGAIN.
   
   Doesn't that cause the first partial chunk to be incorrectly transmitted
   twice? You may only return EAGAIN if no data was transmitted.
  
  Except for the fact that no caller of qemu_chr_write() resubmits (or
  even checks) partial writes.
 
 I don't buy this argument. The current implementation of qemu_chr_write never 
 generates transient failures, so they don't need to.

And applying this patch won't change the situation.

What I proposed in the earlier mail was to buffer only the data that has
to be re-submitted in case the caller is capable of stopping further
output till the char layer indicates it's free to start again.

 Once data has been transmitted, we have three options:
 
 a) Block until the write completes. This makes the whole patch fairly 
 pointless as host and guest block boundaries are unlikely to align.

This is what currently happens and will remain so for callers of
qemu_chr_write() which don't have a .write_unblocked() pointer assigned
in the char dev struct.

 b) Store the data on the side somewhere. Tell the device all data has been 
 sent, and arrange for this data to be flushed before accepting any more data. 
  
 This is bad because it allows the guest to allocate arbitrarily large[1] 
 buffers on the host. i.e. a fairly easily exploitable DoS attack.

With virtio-serial, this is what's in use.  The buffer is limited to the
length of the vq (which is a compile-time constant) and there also is
the virtio_serial_throttle_port() call that tells the guest to not send
any more data to the host till the char layer indicates it's OK to send
more data.

 c) Return a partial write to the guest. The guest already has to handle 
 retries due to EAGAIN, and DMA capable devices already have to handle partial 
 mappings, so this doesn't seem too onerous a requirement. This is not a new 
 concept, it's the same as the unix write(2)/send(2) functions.

This isn't possible with the current vq design.

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-02 Thread Amit Shah
On (Wed) Dec 01 2010 [13:08:26], Paul Brook wrote:
  On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
  -qemu_chr_write(vcon-chr, buf, len);
  +ret = qemu_chr_write(vcon-chr, buf, len);
  +if (ret == -EAGAIN) {
  +virtio_serial_throttle_port(port, true);
  +}
  
   }
 
 This looks wrong. It will loose data in the case of a partial write
 (i.e. ret  len)

That doesn't happen currently (qemu_chr_write doesn't return a value 
0 but  len).

I had code in there to handle it, but that would change behaviour for
current users of qemu_chr_write(), which is a risk.
   
   Doesn't that make the code almost completely pointless?
  
  Not really -- I did have code for partial writes, but removed it before
  this submission (had it in previous versions).
  
  The (new) do_send loop:
  
  len = len1;
  while (len  0) {
  ret = write(fd, buf, len);
  if (ret  0) {
  if (errno == EAGAIN  nonblock) {
  return -EAGAIN;
  }
  if (errno != EINTR  errno != EAGAIN) {
  return -1;
  }
  } else if (ret == 0) {
  break;
  } else {
  buf += ret;
  len -= ret;
  }
  }
  
  when there's a partial write, it tries to do a write again, which will
  fail with -EAGAIN.
 
 Doesn't that cause the first partial chunk to be incorrectly transmitted 
 twice? You may only return EAGAIN if no data was transmitted.

Except for the fact that no caller of qemu_chr_write() resubmits (or
even checks) partial writes.

I think the only way to solve this in a caller-neutral way while keeping
the current semantics is to buffer all the data that comes in for
qemu_chr_write() and re-submit partial writes in the unblock routine
(ie, what I did in virtio-console.c in v7 is to be done in qemu-char.c
now).

That OK?

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-02 Thread Paul Brook
   when there's a partial write, it tries to do a write again, which will
   fail with -EAGAIN.
  
  Doesn't that cause the first partial chunk to be incorrectly transmitted
  twice? You may only return EAGAIN if no data was transmitted.
 
 Except for the fact that no caller of qemu_chr_write() resubmits (or
 even checks) partial writes.

I don't buy this argument. The current implementation of qemu_chr_write never 
generates transient failures, so they don't need to.

If any data has been written then you must not return -EAGAIN.  Doing so will 
cause data corruption - the device will retry the transmit later (e.g. when 
the socket becomes unblocked) and duplicate data will be output.

Once data has been transmitted, we have three options:

a) Block until the write completes. This makes the whole patch fairly 
pointless as host and guest block boundaries are unlikely to align.

b) Store the data on the side somewhere. Tell the device all data has been 
sent, and arrange for this data to be flushed before accepting any more data.  
This is bad because it allows the guest to allocate arbitrarily large[1] 
buffers on the host. i.e. a fairly easily exploitable DoS attack.

c) Return a partial write to the guest. The guest already has to handle 
retries due to EAGAIN, and DMA capable devices already have to handle partial 
mappings, so this doesn't seem too onerous a requirement. This is not a new 
concept, it's the same as the unix write(2)/send(2) functions.

Paul

[1] At least as large as guest RAM, per port.



[Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Amit Shah
When a chardev indicates it can't accept more data, we tell the
virtio-serial code to stop sending us any more data till we tell
otherwise.  This helps in guests continuing to run normally while the vq
keeps getting full and eventually the guest stops queueing more data.
As soon as the chardev indicates it can accept more data, start pushing!

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 749ed59..ec85ad5 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -18,13 +18,27 @@ typedef struct VirtConsole {
 CharDriverState *chr;
 } VirtConsole;
 
+/*
+ * Callback function that's called from chardevs when backend becomes
+ * writable.
+ */
+static void chr_write_unblocked(void *opaque)
+{
+VirtConsole *vcon = opaque;
+
+virtio_serial_throttle_port(vcon-port, false);
+}
 
 /* Callback function that's called when the guest sends us data */
 static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len)
 {
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+int ret;
 
-qemu_chr_write(vcon-chr, buf, len);
+ret = qemu_chr_write(vcon-chr, buf, len);
+if (ret == -EAGAIN) {
+virtio_serial_throttle_port(port, true);
+}
 }
 
 /* Readiness of the guest to accept data on a port */
@@ -62,6 +76,7 @@ static QemuChrHandlers chr_handlers = {
 .fd_can_read = chr_can_read,
 .fd_read = chr_read,
 .fd_event = chr_event,
+.fd_write_unblocked = chr_write_unblocked,
 };
 
 static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
-- 
1.7.3.2




Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Amit Shah
On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
-qemu_chr_write(vcon-chr, buf, len);
+ret = qemu_chr_write(vcon-chr, buf, len);
+if (ret == -EAGAIN) {
+virtio_serial_throttle_port(port, true);
+}

 }
   
   This looks wrong. It will loose data in the case of a partial write
   (i.e. ret  len)
  
  That doesn't happen currently (qemu_chr_write doesn't return a value  0
  but  len).
  
  I had code in there to handle it, but that would change behaviour for
  current users of qemu_chr_write(), which is a risk.
 
 Doesn't that make the code almost completely pointless?

Not really -- I did have code for partial writes, but removed it before
this submission (had it in previous versions).

The (new) do_send loop:

len = len1;
while (len  0) {
ret = write(fd, buf, len);
if (ret  0) {
if (errno == EAGAIN  nonblock) {
return -EAGAIN;
}
if (errno != EINTR  errno != EAGAIN) {
return -1;
}
} else if (ret == 0) {
break;
} else {
buf += ret;
len -= ret;
}
}

when there's a partial write, it tries to do a write again, which will
fail with -EAGAIN.

(However I might be completely missing the data that didn't get written
as a result of that partial write, so the output gets corrupted.  I
guess I still need to handle partial writes for that.)

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Paul Brook
   -qemu_chr_write(vcon-chr, buf, len);
   +ret = qemu_chr_write(vcon-chr, buf, len);
   +if (ret == -EAGAIN) {
   +virtio_serial_throttle_port(port, true);
   +}
   
}
  
  This looks wrong. It will loose data in the case of a partial write
  (i.e. ret  len)
 
 That doesn't happen currently (qemu_chr_write doesn't return a value  0
 but  len).
 
 I had code in there to handle it, but that would change behaviour for
 current users of qemu_chr_write(), which is a risk.

Doesn't that make the code almost completely pointless?

Allowing EAGAIN without allowing short writes means that the writes will still 
block for arbitrarily long periods of time.  You're relying on the kernel 
blocking at the same point the guest happens to split a DMA block. If you're 
transferring any significant quantities of data the chances of that happening 
seem pretty slim.

Paul



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Paul Brook
  /* Callback function that's called when the guest sends us data */
  static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t
 len) {
  VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 +int ret;
 
 -qemu_chr_write(vcon-chr, buf, len);
 +ret = qemu_chr_write(vcon-chr, buf, len);
 +if (ret == -EAGAIN) {
 +virtio_serial_throttle_port(port, true);
 +}
  }

This looks wrong. It will loose data in the case of a partial write
(i.e. ret  len)

Paul



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Amit Shah
On (Wed) Dec 01 2010 [11:30:58], Paul Brook wrote:
   /* Callback function that's called when the guest sends us data */
   static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t
  len) {
   VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
  +int ret;
  
  -qemu_chr_write(vcon-chr, buf, len);
  +ret = qemu_chr_write(vcon-chr, buf, len);
  +if (ret == -EAGAIN) {
  +virtio_serial_throttle_port(port, true);
  +}
   }
 
 This looks wrong. It will loose data in the case of a partial write
 (i.e. ret  len)

That doesn't happen currently (qemu_chr_write doesn't return a value  0
but  len).

I had code in there to handle it, but that would change behaviour for
current users of qemu_chr_write(), which is a risk.

Amit



Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data

2010-12-01 Thread Paul Brook
 On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
 -qemu_chr_write(vcon-chr, buf, len);
 +ret = qemu_chr_write(vcon-chr, buf, len);
 +if (ret == -EAGAIN) {
 +virtio_serial_throttle_port(port, true);
 +}
 
  }

This looks wrong. It will loose data in the case of a partial write
(i.e. ret  len)
   
   That doesn't happen currently (qemu_chr_write doesn't return a value 
   0 but  len).
   
   I had code in there to handle it, but that would change behaviour for
   current users of qemu_chr_write(), which is a risk.
  
  Doesn't that make the code almost completely pointless?
 
 Not really -- I did have code for partial writes, but removed it before
 this submission (had it in previous versions).
 
 The (new) do_send loop:
 
 len = len1;
 while (len  0) {
 ret = write(fd, buf, len);
 if (ret  0) {
 if (errno == EAGAIN  nonblock) {
 return -EAGAIN;
 }
 if (errno != EINTR  errno != EAGAIN) {
 return -1;
 }
 } else if (ret == 0) {
 break;
 } else {
 buf += ret;
 len -= ret;
 }
 }
 
 when there's a partial write, it tries to do a write again, which will
 fail with -EAGAIN.

Doesn't that cause the first partial chunk to be incorrectly transmitted 
twice? You may only return EAGAIN if no data was transmitted.

Paul