Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Juan Quintela
Christoph Hellwig h...@infradead.org wrote:
 On Tue, Sep 22, 2009 at 03:25:13PM +0200, Juan Quintela wrote:
 Christoph Hellwig h...@infradead.org wrote:
  Btw, what's the state of getting compatfd upstream?  It's a pretty
  annoying difference between qemu upstream and qemu-kvm.
 
 I haven't tried.  I can try to send a patch.  Do you have any use case
 that will help the cause?

 Well, the eventfd compat is used in the thread pool AIO code.  I don't
 know what difference it makes, but I really hate this code beeing
 different in both trees.  I want to see compatfd used either in both or
 none.

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly), and eventfd ...  The current
eventfd emulation is worse than the pipe code that it substitutes.

His suggestion here was to create a new abstraction with an API like:

push_notify()

pop_notify()

and then you can implement it with eventfd() pipes/whatever.

What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
Do a push_notify()/pop_notify() work for you?

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Christoph Hellwig
On Thu, Oct 01, 2009 at 01:58:10PM +0200, Juan Quintela wrote:
 Discused with Anthony about it.  signalfd is complicated for qemu
 upstream (too difficult to use properly), and eventfd ...  The current
 eventfd emulation is worse than the pipe code that it substitutes.
 
 His suggestion here was to create a new abstraction with an API like:
 
 push_notify()
 
 pop_notify()
 
 and then you can implement it with eventfd() pipes/whatever.
 
 What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
 Do a push_notify()/pop_notify() work for you?

I don't desperately want to use it myself anyway.  I just want to get
rid of the highly annoyind spurious differences in the AIO code due
to use of compatfd.  I would be perfectly fine with just killing this
use of eventfd in qemu-kvm.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori

Christoph Hellwig wrote:

On Thu, Oct 01, 2009 at 01:58:10PM +0200, Juan Quintela wrote:
  

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly), and eventfd ...  The current
eventfd emulation is worse than the pipe code that it substitutes.

His suggestion here was to create a new abstraction with an API like:

push_notify()

pop_notify()

and then you can implement it with eventfd() pipes/whatever.

What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
Do a push_notify()/pop_notify() work for you?



I don't desperately want to use it myself anyway.  I just want to get
rid of the highly annoyind spurious differences in the AIO code due
to use of compatfd.  I would be perfectly fine with just killing this
use of eventfd in qemu-kvm.
  


That's what I'd suggest.  The use of eventfd in qemu-kvm is wrong 
because the compat function is not implemented correctly.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up getting 
an EAGAIN.  We have no way to queue signals beyond that point and we 
have no sane way to deal with partial writes.


Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the fd's 
buffer.  This requires that we use an fd-per-signal which means we end 
up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not workable.


We could possibly introduce a higher level interface that only required 
one fd per signal and that had a function that drained the signals from 
the fd without returning any special information.


The same is generally true for eventfd.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Avi Kivity

On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up getting 
an EAGAIN.  We have no way to queue signals beyond that point and we 
have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we read 
and write signalfd-sized blocks, we won't get partial writes.  It's true 
that depending on an implementation detail is bad practice, but this is 
emulation code, and if helps simplifying everything else, I think it's 
fine to use it.


hmm, pipe(7) says writes smaller than the pipe buffer size are atomic:

   O_NONBLOCK enabled, n = PIPE_BUF
  If there is room to write n bytes to the pipe, then 
write(2) succeeds immediately, writing all n bytes;

  otherwise write(2) fails, with errno set to EAGAIN.

so it seems this practice has been blessed by posix.

Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the fd's 
buffer.  This requires that we use an fd-per-signal which means we end 
up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?



The same is generally true for eventfd.


eventfd emulation will also never get partial writes.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori

Avi Kivity wrote:

On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up 
getting an EAGAIN.  We have no way to queue signals beyond that point 
and we have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we 
read and write signalfd-sized blocks, we won't get partial writes.  
It's true that depending on an implementation detail is bad practice, 
but this is emulation code, and if helps simplifying everything else, 
I think it's fine to use it.


That's a pretty hairy detail to rely upon..

Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the 
fd's buffer.  This requires that we use an fd-per-signal which means 
we end up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not 
workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?


Do we have any indication that this difference is actually observable?  
This seems like very premature optimization.



The same is generally true for eventfd.


eventfd emulation will also never get partial writes.


But you cannot emulate eventfd faithfully because eventfd is supposed to 
be additive.  If you write 1 50x to eventfd, you should be able to read 
a set of integers that add up to 50.  If you hit EAGAIN in a signal 
handler, you have no way of handling that.


As I said earlier, the better thing to do is have a higher level 
interface that has a subset of the behavior of eventfd/signalfd that we 
can emulate correctly.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Avi Kivity

On 10/01/2009 06:50 PM, Anthony Liguori wrote:

Avi Kivity wrote:

On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a 
pipe from a signal handler.  The problem is that a write() can 
return a partial result and following the partial result, we can end 
up getting an EAGAIN.  We have no way to queue signals beyond that 
point and we have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we 
read and write signalfd-sized blocks, we won't get partial writes.  
It's true that depending on an implementation detail is bad practice, 
but this is emulation code, and if helps simplifying everything else, 
I think it's fine to use it.


That's a pretty hairy detail to rely upon..


Well, it's a posix detail, as I quoted below.  I'm not in love with it 
but it should work.




Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the 
fd's buffer.  This requires that we use an fd-per-signal which means 
we end up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  
I don't think that's a huge advantage and the inability to do 
backwards compatibility in a sane way means that emulated signalfd 
is not workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?


Do we have any indication that this difference is actually 
observable?  This seems like very premature optimization.


Multiply the signal rate by a few microseconds, if you get more than 
0.1% cpu it's worthwhile in my opinion.  The code is localized, and 
signalfd is a better interface than signals.





The same is generally true for eventfd.


eventfd emulation will also never get partial writes.


But you cannot emulate eventfd faithfully because eventfd is supposed 
to be additive.  If you write 1 50x to eventfd, you should be able to 
read a set of integers that add up to 50.  If you hit EAGAIN in a 
signal handler, you have no way of handling that.


We never rely on the count anyway.  You can simply ignore EAGAIN.

As I said earlier, the better thing to do is have a higher level 
interface that has a subset of the behavior of eventfd/signalfd that 
we can emulate correctly.


Sure, but it's more work.  Copying an existing interface is easier.  
It's not like there's no other work in qemu left to be done.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-09-22 Thread Christoph Hellwig
Btw, what's the state of getting compatfd upstream?  It's a pretty
annoying difference between qemu upstream and qemu-kvm.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-09-22 Thread Juan Quintela
Christoph Hellwig h...@infradead.org wrote:
 Btw, what's the state of getting compatfd upstream?  It's a pretty
 annoying difference between qemu upstream and qemu-kvm.

I haven't tried.  I can try to send a patch.  Do you have any use case
that will help the cause?

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-09-22 Thread Christoph Hellwig
On Tue, Sep 22, 2009 at 03:25:13PM +0200, Juan Quintela wrote:
 Christoph Hellwig h...@infradead.org wrote:
  Btw, what's the state of getting compatfd upstream?  It's a pretty
  annoying difference between qemu upstream and qemu-kvm.
 
 I haven't tried.  I can try to send a patch.  Do you have any use case
 that will help the cause?

Well, the eventfd compat is used in the thread pool AIO code.  I don't
know what difference it makes, but I really hate this code beeing
different in both trees.  I want to see compatfd used either in both or
none.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html