Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-30 Thread Chuck Lever
On Jul 30, 2015, at 3:00 AM, Sagi Grimberg sa...@dev.mellanox.co.il wrote: The drivers we have that don't dequeue all the CQEs are doing something like NAPI polling and have other mechanisms to guarentee progress. Don't copy something like budget without copying the other mechanisms :)

Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-30 Thread Sagi Grimberg
The drivers we have that don't dequeue all the CQEs are doing something like NAPI polling and have other mechanisms to guarentee progress. Don't copy something like budget without copying the other mechanisms :) OK, that makes total sense. Thanks for clarifying. IIRC NAPI is soft-IRQ which

Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-29 Thread Jason Gunthorpe
On Wed, Jul 29, 2015 at 04:47:59PM -0400, Chuck Lever wrote: Apparently this is true for some providers, and not for others, and I misunderstood that when I put this together last year. Really? In kernel providers? Interesting, those are probably wrong... The idea that you can completely

Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-29 Thread Chuck Lever
On Jul 29, 2015, at 5:15 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Jul 29, 2015 at 04:47:59PM -0400, Chuck Lever wrote: Apparently this is true for some providers, and not for others, and I misunderstood that when I put this together last year. Really? In kernel

Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-29 Thread Chuck Lever
Hi Jason- On Jul 24, 2015, at 4:46 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Fri, Jul 24, 2015 at 04:26:00PM -0400, Chuck Lever wrote: Basically RPC work flow stopped because an RPC reply never arrived. Oh, that is what I expect to see.. Remebmer the cq upcall is edge

Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-24 Thread Chuck Lever
During some other testing I found that when a completion upcall returns to the provider leaving CQEs still on the completion queue, there is a non-zero probability that a completion will be lost. What does lost mean? Lost means a WC in the CQ is skipped by ib_poll_cq(). In other

Re: Potential lost receive WCs (was [PATCH WIP 38/43])

2015-07-24 Thread Jason Gunthorpe
On Fri, Jul 24, 2015 at 04:26:00PM -0400, Chuck Lever wrote: Basically RPC work flow stopped because an RPC reply never arrived. Oh, that is what I expect to see.. Remebmer the cq upcall is edge triggered, so if you leave stuff in the cq then you don't get another upcall until another CQE is