Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Sorry I mistakenly focused on NET case, please pass it over. I will do any
relevant suggestion in patch-based way. Thanks.

On Sun, 2017-08-13 at 23:11 +0900, Koichiro Den wrote:
> Thanks for your comments, Michael and Jason. And I'm sorry about late
> response.
> To be honest, I am on a summer vacation until next Tuesday.
> 
> I noticed that what I wrote was not sufficient. Regardless of caching
> mechanism
> existence, the "event" could legitimately be at any point out of the latest
> interval, which vhost_notify checks it against, meaning that if it's out of
> the
> interval we cannot distinguish whether or not it lags behind or has a
> lead. And
> it seems to conform to the spec. Thanks for leading me to the spec. The corner
> case I point out here is:
> (0) event idx feature turned on + TX napi turned off
> -> (1) guest-side TX traffic bursting occurs and delayed callback set
> -> (2) some interruption triggers skb_xmit_done
> -> (3) guest-side modest traffic makes the interval proceed to unbounded
> extent
> without updating "event" since NO_INTERRUPT continues to be set on its shadow
> flag.
> 
> IMHO, if you plan to make TX napi the only choice, doing this sort of
> optimisation beforehand seems likely to be in vain.
> 
> So, if the none-TX napi case continues to coexist, what I would like to
> suggest
> is not just the sort of my last email, but like making maximum staleness of
> "event" less than or equal to vq.num, and afterward caching suggestion.
> Otherwise, I guess I should not repost my last email since it would make
> matters
>  too complicated even though it will soon be removed when TX-napi becomes the
> only choice.
> 
> Thanks!
> 
> 
> On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > > I think don't think current code can work well if vq.num is grater than
> > > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > > fixed.
> > 
> > That's a limitation of virtio 1.0.
> > 
> > > > * else if the interval of vq.num is [2^15, 2^16):
> > > > the logic in the original patch (809ecb9bca6a9) suffices
> > > > * else (= less than 2^15) (optional):
> > > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > > would suffice.
> > > > 
> > > > Am I missing something, or is this irrelevant?
> > 
> > Could you pls repost the suggestion copying virtio-dev mailing list
> > (subscriber only, sorry about that, but host/guest ABI discussions
> > need to copy that list)?
> > 
> > > Looks not, I think this may work. Let me do some test.
> > > 
> > > Thanks
> > 
> > I think that at this point it's prudent to add a feature bit
> > as the virtio spec does not require to never move the event index back.
> > 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Sorry I mistakenly focused on NET case, please pass it over. I will do any
relevant suggestion in patch-based way. Thanks.

On Sun, 2017-08-13 at 23:11 +0900, Koichiro Den wrote:
> Thanks for your comments, Michael and Jason. And I'm sorry about late
> response.
> To be honest, I am on a summer vacation until next Tuesday.
> 
> I noticed that what I wrote was not sufficient. Regardless of caching
> mechanism
> existence, the "event" could legitimately be at any point out of the latest
> interval, which vhost_notify checks it against, meaning that if it's out of
> the
> interval we cannot distinguish whether or not it lags behind or has a
> lead. And
> it seems to conform to the spec. Thanks for leading me to the spec. The corner
> case I point out here is:
> (0) event idx feature turned on + TX napi turned off
> -> (1) guest-side TX traffic bursting occurs and delayed callback set
> -> (2) some interruption triggers skb_xmit_done
> -> (3) guest-side modest traffic makes the interval proceed to unbounded
> extent
> without updating "event" since NO_INTERRUPT continues to be set on its shadow
> flag.
> 
> IMHO, if you plan to make TX napi the only choice, doing this sort of
> optimisation beforehand seems likely to be in vain.
> 
> So, if the none-TX napi case continues to coexist, what I would like to
> suggest
> is not just the sort of my last email, but like making maximum staleness of
> "event" less than or equal to vq.num, and afterward caching suggestion.
> Otherwise, I guess I should not repost my last email since it would make
> matters
>  too complicated even though it will soon be removed when TX-napi becomes the
> only choice.
> 
> Thanks!
> 
> 
> On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > > I think don't think current code can work well if vq.num is grater than
> > > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > > fixed.
> > 
> > That's a limitation of virtio 1.0.
> > 
> > > > * else if the interval of vq.num is [2^15, 2^16):
> > > > the logic in the original patch (809ecb9bca6a9) suffices
> > > > * else (= less than 2^15) (optional):
> > > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > > would suffice.
> > > > 
> > > > Am I missing something, or is this irrelevant?
> > 
> > Could you pls repost the suggestion copying virtio-dev mailing list
> > (subscriber only, sorry about that, but host/guest ABI discussions
> > need to copy that list)?
> > 
> > > Looks not, I think this may work. Let me do some test.
> > > 
> > > Thanks
> > 
> > I think that at this point it's prudent to add a feature bit
> > as the virtio spec does not require to never move the event index back.
> > 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Thanks for your comments, Michael and Jason. And I'm sorry about late response.
To be honest, I am on a summer vacation until next Tuesday.

I noticed that what I wrote was not sufficient. Regardless of caching mechanism
existence, the "event" could legitimately be at any point out of the latest
interval, which vhost_notify checks it against, meaning that if it's out of the
interval we cannot distinguish whether or not it lags behind or has a lead. And
it seems to conform to the spec. Thanks for leading me to the spec. The corner
case I point out here is:
(0) event idx feature turned on + TX napi turned off
-> (1) guest-side TX traffic bursting occurs and delayed callback set
-> (2) some interruption triggers skb_xmit_done
-> (3) guest-side modest traffic makes the interval proceed to unbounded extent
without updating "event" since NO_INTERRUPT continues to be set on its shadow
flag.

IMHO, if you plan to make TX napi the only choice, doing this sort of
optimisation beforehand seems likely to be in vain.

So, if the none-TX napi case continues to coexist, what I would like to suggest
is not just the sort of my last email, but like making maximum staleness of
"event" less than or equal to vq.num, and afterward caching suggestion.
Otherwise, I guess I should not repost my last email since it would make matters
 too complicated even though it will soon be removed when TX-napi becomes the
only choice.

Thanks!


On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > I think don't think current code can work well if vq.num is grater than
> > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > fixed.
> 
> That's a limitation of virtio 1.0.
> 
> > > * else if the interval of vq.num is [2^15, 2^16):
> > > the logic in the original patch (809ecb9bca6a9) suffices
> > > * else (= less than 2^15) (optional):
> > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > would suffice.
> > > 
> > > Am I missing something, or is this irrelevant?
> 
> Could you pls repost the suggestion copying virtio-dev mailing list
> (subscriber only, sorry about that, but host/guest ABI discussions
> need to copy that list)?
> 
> > Looks not, I think this may work. Let me do some test.
> > 
> > Thanks
> 
> I think that at this point it's prudent to add a feature bit
> as the virtio spec does not require to never move the event index back.
> 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Thanks for your comments, Michael and Jason. And I'm sorry about late response.
To be honest, I am on a summer vacation until next Tuesday.

I noticed that what I wrote was not sufficient. Regardless of caching mechanism
existence, the "event" could legitimately be at any point out of the latest
interval, which vhost_notify checks it against, meaning that if it's out of the
interval we cannot distinguish whether or not it lags behind or has a lead. And
it seems to conform to the spec. Thanks for leading me to the spec. The corner
case I point out here is:
(0) event idx feature turned on + TX napi turned off
-> (1) guest-side TX traffic bursting occurs and delayed callback set
-> (2) some interruption triggers skb_xmit_done
-> (3) guest-side modest traffic makes the interval proceed to unbounded extent
without updating "event" since NO_INTERRUPT continues to be set on its shadow
flag.

IMHO, if you plan to make TX napi the only choice, doing this sort of
optimisation beforehand seems likely to be in vain.

So, if the none-TX napi case continues to coexist, what I would like to suggest
is not just the sort of my last email, but like making maximum staleness of
"event" less than or equal to vq.num, and afterward caching suggestion.
Otherwise, I guess I should not repost my last email since it would make matters
 too complicated even though it will soon be removed when TX-napi becomes the
only choice.

Thanks!


On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > I think don't think current code can work well if vq.num is grater than
> > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > fixed.
> 
> That's a limitation of virtio 1.0.
> 
> > > * else if the interval of vq.num is [2^15, 2^16):
> > > the logic in the original patch (809ecb9bca6a9) suffices
> > > * else (= less than 2^15) (optional):
> > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > would suffice.
> > > 
> > > Am I missing something, or is this irrelevant?
> 
> Could you pls repost the suggestion copying virtio-dev mailing list
> (subscriber only, sorry about that, but host/guest ABI discussions
> need to copy that list)?
> 
> > Looks not, I think this may work. Let me do some test.
> > 
> > Thanks
> 
> I think that at this point it's prudent to add a feature bit
> as the virtio spec does not require to never move the event index back.
> 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-08 Thread Michael S. Tsirkin
On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> I think don't think current code can work well if vq.num is grater than
> 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> fixed.

That's a limitation of virtio 1.0.

> > * else if the interval of vq.num is [2^15, 2^16):
> > the logic in the original patch (809ecb9bca6a9) suffices
> > * else (= less than 2^15) (optional):
> > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > would suffice.
> > 
> > Am I missing something, or is this irrelevant?

Could you pls repost the suggestion copying virtio-dev mailing list
(subscriber only, sorry about that, but host/guest ABI discussions
need to copy that list)?

> Looks not, I think this may work. Let me do some test.
> 
> Thanks

I think that at this point it's prudent to add a feature bit
as the virtio spec does not require to never move the event index back.



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-08 Thread Michael S. Tsirkin
On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> I think don't think current code can work well if vq.num is grater than
> 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> fixed.

That's a limitation of virtio 1.0.

> > * else if the interval of vq.num is [2^15, 2^16):
> > the logic in the original patch (809ecb9bca6a9) suffices
> > * else (= less than 2^15) (optional):
> > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > would suffice.
> > 
> > Am I missing something, or is this irrelevant?

Could you pls repost the suggestion copying virtio-dev mailing list
(subscriber only, sorry about that, but host/guest ABI discussions
need to copy that list)?

> Looks not, I think this may work. Let me do some test.
> 
> Thanks

I think that at this point it's prudent to add a feature bit
as the virtio spec does not require to never move the event index back.



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-08 Thread Jason Wang



On 2017年07月30日 14:26, K. Den wrote:

On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:


On 2017年07月26日 21:18, Jason Wang wrote:


On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.


The problem we don't know whether or not guest has published a new used
event. The check vring_need_event(vq->last_used_event, new + vq->num,
new) is not sufficient to check for this.

Thanks

More notes, the previous assumption is that we don't move used event back,
but this could happen in fact if idx is wrapper around.

You mean if the 16 bit index wraps around after 64K entries.
Makes sense.


Will repost and add
this into commit log.

Thanks

Hi,


Hi, sorry for the late reply, was on vacation last week.



I am just curious but I have got a question:
AFAIU, if you wanted to keep the caching mechanism alive in the code base,
the following two changes could clear off the issue, or not?:
(1) Always fetch the latest event value from guest when signalled_used event is
invalid, which includes last_used_idx wraps-around case. Otherwise we might need
changes which would complicate too much the logic to properly decide whether or
not to skip signalling in the next vhost_notify round.
(2) On top of that, split the signal-postponing logic to three cases like:
* if the interval of vq.num is [2^16, UINT_MAX]:
any cached event is in should-postpone-signalling interval, so paradoxically
must always do signalling.


I think don't think current code can work well if vq.num is grater than 
2^15. Since all cached idx is u16. This looks like a bug which needs to 
be fixed.



* else if the interval of vq.num is [2^15, 2^16):
the logic in the original patch (809ecb9bca6a9) suffices
* else (= less than 2^15) (optional):
checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
would suffice.

Am I missing something, or is this irrelevant?


Looks not, I think this may work. Let me do some test.

Thanks


I would appreciate if you could elaborate a bit more how the situation where
event idx wraps around and moves back would make trouble.

Thanks.






Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-08 Thread Jason Wang



On 2017年07月30日 14:26, K. Den wrote:

On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:


On 2017年07月26日 21:18, Jason Wang wrote:


On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.


The problem we don't know whether or not guest has published a new used
event. The check vring_need_event(vq->last_used_event, new + vq->num,
new) is not sufficient to check for this.

Thanks

More notes, the previous assumption is that we don't move used event back,
but this could happen in fact if idx is wrapper around.

You mean if the 16 bit index wraps around after 64K entries.
Makes sense.


Will repost and add
this into commit log.

Thanks

Hi,


Hi, sorry for the late reply, was on vacation last week.



I am just curious but I have got a question:
AFAIU, if you wanted to keep the caching mechanism alive in the code base,
the following two changes could clear off the issue, or not?:
(1) Always fetch the latest event value from guest when signalled_used event is
invalid, which includes last_used_idx wraps-around case. Otherwise we might need
changes which would complicate too much the logic to properly decide whether or
not to skip signalling in the next vhost_notify round.
(2) On top of that, split the signal-postponing logic to three cases like:
* if the interval of vq.num is [2^16, UINT_MAX]:
any cached event is in should-postpone-signalling interval, so paradoxically
must always do signalling.


I think don't think current code can work well if vq.num is grater than 
2^15. Since all cached idx is u16. This looks like a bug which needs to 
be fixed.



* else if the interval of vq.num is [2^15, 2^16):
the logic in the original patch (809ecb9bca6a9) suffices
* else (= less than 2^15) (optional):
checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
would suffice.

Am I missing something, or is this irrelevant?


Looks not, I think this may work. Let me do some test.

Thanks


I would appreciate if you could elaborate a bit more how the situation where
event idx wraps around and moves back would make trouble.

Thanks.






Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-30 Thread K. Den
On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:
> > 
> > 
> > On 2017年07月26日 21:18, Jason Wang wrote:
> > > 
> > > 
> > > On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > > > was reported to break vhost_net. We want to cache used event and use
> > > > > it to check for notification. We try to valid cached used event by
> > > > > checking whether or not it was ahead of new, but this is not correct
> > > > > all the time, it could be stale and there's no way to know about this.
> > > > > 
> > > > > Signed-off-by: Jason Wang
> > > > 
> > > > Could you supply a bit more data here please?  How does it get stale?
> > > > What does guest need to do to make it stale?  This will be helpful if
> > > > anyone wants to bring it back, or if we want to extend the protocol.
> > > > 
> > > 
> > > The problem we don't know whether or not guest has published a new used
> > > event. The check vring_need_event(vq->last_used_event, new + vq->num,
> > > new) is not sufficient to check for this.
> > > 
> > > Thanks
> > 
> > More notes, the previous assumption is that we don't move used event back,
> > but this could happen in fact if idx is wrapper around.
> 
> You mean if the 16 bit index wraps around after 64K entries.
> Makes sense.
> 
> > Will repost and add
> > this into commit log.
> > 
> > Thanks

Hi,

I am just curious but I have got a question:
AFAIU, if you wanted to keep the caching mechanism alive in the code base,
the following two changes could clear off the issue, or not?:
(1) Always fetch the latest event value from guest when signalled_used event is
invalid, which includes last_used_idx wraps-around case. Otherwise we might need
changes which would complicate too much the logic to properly decide whether or
not to skip signalling in the next vhost_notify round.
(2) On top of that, split the signal-postponing logic to three cases like:
* if the interval of vq.num is [2^16, UINT_MAX]:
any cached event is in should-postpone-signalling interval, so paradoxically
must always do signalling.
* else if the interval of vq.num is [2^15, 2^16):
the logic in the original patch (809ecb9bca6a9) suffices
* else (= less than 2^15) (optional):
checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
would suffice.

Am I missing something, or is this irrelevant?
I would appreciate if you could elaborate a bit more how the situation where
event idx wraps around and moves back would make trouble.

Thanks.



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-30 Thread K. Den
On Wed, 2017-07-26 at 19:08 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:
> > 
> > 
> > On 2017年07月26日 21:18, Jason Wang wrote:
> > > 
> > > 
> > > On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > > > was reported to break vhost_net. We want to cache used event and use
> > > > > it to check for notification. We try to valid cached used event by
> > > > > checking whether or not it was ahead of new, but this is not correct
> > > > > all the time, it could be stale and there's no way to know about this.
> > > > > 
> > > > > Signed-off-by: Jason Wang
> > > > 
> > > > Could you supply a bit more data here please?  How does it get stale?
> > > > What does guest need to do to make it stale?  This will be helpful if
> > > > anyone wants to bring it back, or if we want to extend the protocol.
> > > > 
> > > 
> > > The problem we don't know whether or not guest has published a new used
> > > event. The check vring_need_event(vq->last_used_event, new + vq->num,
> > > new) is not sufficient to check for this.
> > > 
> > > Thanks
> > 
> > More notes, the previous assumption is that we don't move used event back,
> > but this could happen in fact if idx is wrapper around.
> 
> You mean if the 16 bit index wraps around after 64K entries.
> Makes sense.
> 
> > Will repost and add
> > this into commit log.
> > 
> > Thanks

Hi,

I am just curious but I have got a question:
AFAIU, if you wanted to keep the caching mechanism alive in the code base,
the following two changes could clear off the issue, or not?:
(1) Always fetch the latest event value from guest when signalled_used event is
invalid, which includes last_used_idx wraps-around case. Otherwise we might need
changes which would complicate too much the logic to properly decide whether or
not to skip signalling in the next vhost_notify round.
(2) On top of that, split the signal-postponing logic to three cases like:
* if the interval of vq.num is [2^16, UINT_MAX]:
any cached event is in should-postpone-signalling interval, so paradoxically
must always do signalling.
* else if the interval of vq.num is [2^15, 2^16):
the logic in the original patch (809ecb9bca6a9) suffices
* else (= less than 2^15) (optional):
checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
would suffice.

Am I missing something, or is this irrelevant?
I would appreciate if you could elaborate a bit more how the situation where
event idx wraps around and moves back would make trouble.

Thanks.



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月26日 21:18, Jason Wang wrote:
> > 
> > 
> > On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > > was reported to break vhost_net. We want to cache used event and use
> > > > it to check for notification. We try to valid cached used event by
> > > > checking whether or not it was ahead of new, but this is not correct
> > > > all the time, it could be stale and there's no way to know about this.
> > > > 
> > > > Signed-off-by: Jason Wang
> > > Could you supply a bit more data here please?  How does it get stale?
> > > What does guest need to do to make it stale?  This will be helpful if
> > > anyone wants to bring it back, or if we want to extend the protocol.
> > > 
> > 
> > The problem we don't know whether or not guest has published a new used
> > event. The check vring_need_event(vq->last_used_event, new + vq->num,
> > new) is not sufficient to check for this.
> > 
> > Thanks
> 
> More notes, the previous assumption is that we don't move used event back,
> but this could happen in fact if idx is wrapper around.

You mean if the 16 bit index wraps around after 64K entries.
Makes sense.

> Will repost and add
> this into commit log.
> 
> Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 09:37:15PM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月26日 21:18, Jason Wang wrote:
> > 
> > 
> > On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > > was reported to break vhost_net. We want to cache used event and use
> > > > it to check for notification. We try to valid cached used event by
> > > > checking whether or not it was ahead of new, but this is not correct
> > > > all the time, it could be stale and there's no way to know about this.
> > > > 
> > > > Signed-off-by: Jason Wang
> > > Could you supply a bit more data here please?  How does it get stale?
> > > What does guest need to do to make it stale?  This will be helpful if
> > > anyone wants to bring it back, or if we want to extend the protocol.
> > > 
> > 
> > The problem we don't know whether or not guest has published a new used
> > event. The check vring_need_event(vq->last_used_event, new + vq->num,
> > new) is not sufficient to check for this.
> > 
> > Thanks
> 
> More notes, the previous assumption is that we don't move used event back,
> but this could happen in fact if idx is wrapper around.

You mean if the 16 bit index wraps around after 64K entries.
Makes sense.

> Will repost and add
> this into commit log.
> 
> Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 09:18:02PM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > was reported to break vhost_net. We want to cache used event and use
> > > it to check for notification. We try to valid cached used event by
> > > checking whether or not it was ahead of new, but this is not correct
> > > all the time, it could be stale and there's no way to know about this.
> > > 
> > > Signed-off-by: Jason Wang
> > Could you supply a bit more data here please?  How does it get stale?
> > What does guest need to do to make it stale?  This will be helpful if
> > anyone wants to bring it back, or if we want to extend the protocol.
> > 
> 
> The problem we don't know whether or not guest has published a new used
> event. The check vring_need_event(vq->last_used_event, new + vq->num, new)
> is not sufficient to check for this.
> 
> Thanks

Figured it out after an offline discussion.


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 09:18:02PM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月26日 20:57, Michael S. Tsirkin wrote:
> > On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> > > This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> > > was reported to break vhost_net. We want to cache used event and use
> > > it to check for notification. We try to valid cached used event by
> > > checking whether or not it was ahead of new, but this is not correct
> > > all the time, it could be stale and there's no way to know about this.
> > > 
> > > Signed-off-by: Jason Wang
> > Could you supply a bit more data here please?  How does it get stale?
> > What does guest need to do to make it stale?  This will be helpful if
> > anyone wants to bring it back, or if we want to extend the protocol.
> > 
> 
> The problem we don't know whether or not guest has published a new used
> event. The check vring_need_event(vq->last_used_event, new + vq->num, new)
> is not sufficient to check for this.
> 
> Thanks

Figured it out after an offline discussion.


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 21:18, Jason Wang wrote:



On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.



The problem we don't know whether or not guest has published a new 
used event. The check vring_need_event(vq->last_used_event, new + 
vq->num, new) is not sufficient to check for this.


Thanks


More notes, the previous assumption is that we don't move used event 
back, but this could happen in fact if idx is wrapper around. Will 
repost and add this into commit log.


Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 21:18, Jason Wang wrote:



On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.



The problem we don't know whether or not guest has published a new 
used event. The check vring_need_event(vq->last_used_event, new + 
vq->num, new) is not sufficient to check for this.


Thanks


More notes, the previous assumption is that we don't move used event 
back, but this could happen in fact if idx is wrapper around. Will 
repost and add this into commit log.


Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.



The problem we don't know whether or not guest has published a new used 
event. The check vring_need_event(vq->last_used_event, new + vq->num, 
new) is not sufficient to check for this.


Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 20:57, Michael S. Tsirkin wrote:

On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang

Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.



The problem we don't know whether or not guest has published a new used 
event. The check vring_need_event(vq->last_used_event, new + vq->num, 
new) is not sufficient to check for this.


Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> was reported to break vhost_net. We want to cache used event and use
> it to check for notification. We try to valid cached used event by
> checking whether or not it was ahead of new, but this is not correct
> all the time, it could be stale and there's no way to know about this.
> 
> Signed-off-by: Jason Wang 


Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.

> ---
>  drivers/vhost/vhost.c | 28 ++--
>  drivers/vhost/vhost.h |  3 ---
>  2 files changed, 6 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3..9cb3f72 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -308,7 +308,6 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   vq->avail = NULL;
>   vq->used = NULL;
>   vq->last_avail_idx = 0;
> - vq->last_used_event = 0;
>   vq->avail_idx = 0;
>   vq->last_used_idx = 0;
>   vq->signalled_used = 0;
> @@ -1402,7 +1401,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, 
> void __user *argp)
>   r = -EINVAL;
>   break;
>   }
> - vq->last_avail_idx = vq->last_used_event = s.num;
> + vq->last_avail_idx = s.num;
>   /* Forget the cached index value. */
>   vq->avail_idx = vq->last_avail_idx;
>   break;
> @@ -2241,6 +2240,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>   __u16 old, new;
>   __virtio16 event;
>   bool v;
> + /* Flush out used index updates. This is paired
> +  * with the barrier that the Guest executes when enabling
> +  * interrupts. */
> + smp_mb();
>  
>   if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
>   unlikely(vq->avail_idx == vq->last_avail_idx))
> @@ -2248,10 +2251,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>  
>   if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
>   __virtio16 flags;
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
>   if (vhost_get_avail(vq, flags, >avail->flags)) {
>   vq_err(vq, "Failed to get flags");
>   return true;
> @@ -2266,26 +2265,11 @@ static bool vhost_notify(struct vhost_dev *dev, 
> struct vhost_virtqueue *vq)
>   if (unlikely(!v))
>   return true;
>  
> - /* We're sure if the following conditions are met, there's no
> -  * need to notify guest:
> -  * 1) cached used event is ahead of new
> -  * 2) old to new updating does not cross cached used event. */
> - if (vring_need_event(vq->last_used_event, new + vq->num, new) &&
> - !vring_need_event(vq->last_used_event, new, old))
> - return false;
> -
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
> -
>   if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
>   vq_err(vq, "Failed to get used event idx");
>   return true;
>   }
> - vq->last_used_event = vhost16_to_cpu(vq, event);
> -
> - return vring_need_event(vq->last_used_event, new, old);
> + return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
>  /* This actually signals the guest, using eventfd. */
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index f720958..bb7c29b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,9 +115,6 @@ struct vhost_virtqueue {
>   /* Last index we used. */
>   u16 last_used_idx;
>  
> - /* Last used evet we've seen */
> - u16 last_used_event;
> -
>   /* Used flags */
>   u16 used_flags;
>  
> -- 
> 2.7.4


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Michael S. Tsirkin
On Wed, Jul 26, 2017 at 04:03:17PM +0800, Jason Wang wrote:
> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> was reported to break vhost_net. We want to cache used event and use
> it to check for notification. We try to valid cached used event by
> checking whether or not it was ahead of new, but this is not correct
> all the time, it could be stale and there's no way to know about this.
> 
> Signed-off-by: Jason Wang 


Could you supply a bit more data here please?  How does it get stale?
What does guest need to do to make it stale?  This will be helpful if
anyone wants to bring it back, or if we want to extend the protocol.

> ---
>  drivers/vhost/vhost.c | 28 ++--
>  drivers/vhost/vhost.h |  3 ---
>  2 files changed, 6 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3..9cb3f72 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -308,7 +308,6 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   vq->avail = NULL;
>   vq->used = NULL;
>   vq->last_avail_idx = 0;
> - vq->last_used_event = 0;
>   vq->avail_idx = 0;
>   vq->last_used_idx = 0;
>   vq->signalled_used = 0;
> @@ -1402,7 +1401,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, 
> void __user *argp)
>   r = -EINVAL;
>   break;
>   }
> - vq->last_avail_idx = vq->last_used_event = s.num;
> + vq->last_avail_idx = s.num;
>   /* Forget the cached index value. */
>   vq->avail_idx = vq->last_avail_idx;
>   break;
> @@ -2241,6 +2240,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>   __u16 old, new;
>   __virtio16 event;
>   bool v;
> + /* Flush out used index updates. This is paired
> +  * with the barrier that the Guest executes when enabling
> +  * interrupts. */
> + smp_mb();
>  
>   if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
>   unlikely(vq->avail_idx == vq->last_avail_idx))
> @@ -2248,10 +2251,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>  
>   if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
>   __virtio16 flags;
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
>   if (vhost_get_avail(vq, flags, >avail->flags)) {
>   vq_err(vq, "Failed to get flags");
>   return true;
> @@ -2266,26 +2265,11 @@ static bool vhost_notify(struct vhost_dev *dev, 
> struct vhost_virtqueue *vq)
>   if (unlikely(!v))
>   return true;
>  
> - /* We're sure if the following conditions are met, there's no
> -  * need to notify guest:
> -  * 1) cached used event is ahead of new
> -  * 2) old to new updating does not cross cached used event. */
> - if (vring_need_event(vq->last_used_event, new + vq->num, new) &&
> - !vring_need_event(vq->last_used_event, new, old))
> - return false;
> -
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
> -
>   if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
>   vq_err(vq, "Failed to get used event idx");
>   return true;
>   }
> - vq->last_used_event = vhost16_to_cpu(vq, event);
> -
> - return vring_need_event(vq->last_used_event, new, old);
> + return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
>  /* This actually signals the guest, using eventfd. */
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index f720958..bb7c29b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,9 +115,6 @@ struct vhost_virtqueue {
>   /* Last index we used. */
>   u16 last_used_idx;
>  
> - /* Last used evet we've seen */
> - u16 last_used_event;
> -
>   /* Used flags */
>   u16 used_flags;
>  
> -- 
> 2.7.4


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 18:53, Christian Borntraeger wrote:

On 07/26/2017 10:03 AM, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang 

This would then qualify for stable ?



Yes it is.

Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Jason Wang



On 2017年07月26日 18:53, Christian Borntraeger wrote:

On 07/26/2017 10:03 AM, Jason Wang wrote:

This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. We try to valid cached used event by
checking whether or not it was ahead of new, but this is not correct
all the time, it could be stale and there's no way to know about this.

Signed-off-by: Jason Wang 

This would then qualify for stable ?



Yes it is.

Thanks


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Christian Borntraeger
On 07/26/2017 10:03 AM, Jason Wang wrote:
> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> was reported to break vhost_net. We want to cache used event and use
> it to check for notification. We try to valid cached used event by
> checking whether or not it was ahead of new, but this is not correct
> all the time, it could be stale and there's no way to know about this.
> 
> Signed-off-by: Jason Wang 

This would then qualify for stable ?



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-07-26 Thread Christian Borntraeger
On 07/26/2017 10:03 AM, Jason Wang wrote:
> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> was reported to break vhost_net. We want to cache used event and use
> it to check for notification. We try to valid cached used event by
> checking whether or not it was ahead of new, but this is not correct
> all the time, it could be stale and there's no way to know about this.
> 
> Signed-off-by: Jason Wang 

This would then qualify for stable ?