Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-26 Thread Oliver Neukum
Am Dienstag, den 25.04.2017, 13:07 +0200 schrieb Bjørn Mork:
> 
> No, I am not sure.  But I don't see how we can both cache the error and
> continue to read new data. The error is irrelevant after the next URB
> has completed.  If we decide that every error should be delivered to
> userspace, then we either have to stop transmission or maintain a queue
> of device reads, keeping data and errors in sync.  Delivering an old
> error to userspace along with newer data makes no sense.
> 
> Also note that this code path is skipped entirely for several common
> errors. So they are never reported to userspace.

Hi,

OK.

First my apology. I should have asked for clarification before accepting this 
patch.
I have learned from this.

Second, as your code expects the old behavior, a revert is the correct course
of action for now.

Third, if I approach your argument from a principal view point, I see a couple
of issues. Primarily, how important is the error code at all? And secondly
it seems to me that a device reset must be reported at all cost.

Regards
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-25 Thread Bjørn Mork
Oliver Neukum  writes:

> Am Donnerstag, den 20.04.2017, 10:32 +0200 schrieb Bjørn Mork:
>> > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
>> >   }
>> >   }
>> >  
>> > - desc->rerr = status;
>> > + /*
>> > +  * only set a new error if there is no previous error.
>> > +  * Errors are only cleared during read/open
>> > +  */
>> > + if (desc->rerr  == 0)
>> > + desc->rerr = status;
>> > +
>> >   if (length + desc->length > desc->wMaxCommand) {
>> >   /* The buffer would overflow */
>> >   set_bit(WDM_OVERFLOW, >flags);
>> > 
>> 
>> 
>> This is minor, but in my opinion this change is flawed.  It changes the
>> meaning of "desc->rerr" from the original "last status code" to the new
>> "first error since last userspace read".
>
> Indeed. And that will preserve the initial cause of the problem better.
> We could run into the zero length read case. Are you sure you want the
> old behavior back?

No, I am not sure.  But I don't see how we can both cache the error and
continue to read new data. The error is irrelevant after the next URB
has completed.  If we decide that every error should be delivered to
userspace, then we either have to stop transmission or maintain a queue
of device reads, keeping data and errors in sync.  Delivering an old
error to userspace along with newer data makes no sense.

Also note that this code path is skipped entirely for several common
errors. So they are never reported to userspace.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-25 Thread Oliver Neukum
Am Donnerstag, den 20.04.2017, 10:32 +0200 schrieb Bjørn Mork:
> > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
> >   }
> >   }
> >  
> > - desc->rerr = status;
> > + /*
> > +  * only set a new error if there is no previous error.
> > +  * Errors are only cleared during read/open
> > +  */
> > + if (desc->rerr  == 0)
> > + desc->rerr = status;
> > +
> >   if (length + desc->length > desc->wMaxCommand) {
> >   /* The buffer would overflow */
> >   set_bit(WDM_OVERFLOW, >flags);
> > 
> 
> 
> This is minor, but in my opinion this change is flawed.  It changes the
> meaning of "desc->rerr" from the original "last status code" to the new
> "first error since last userspace read".

Indeed. And that will preserve the initial cause of the problem better.
We could run into the zero length read case. Are you sure you want the
old behavior back?

Regards
Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-25 Thread Bjørn Mork
Robert Foss  writes:
> On 2017-04-24 08:02 AM, Bjørn Mork wrote:
>> Aleksander Morgado  writes:
>>> On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork  wrote:
 Sorry for being much too late here, but during recent attemts to debug
 issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
 to missing notifications") I believe I found a couple of issues with
 commit c1da59dad0eb. At least one of them is serious (potentional
 GPF_KERNEL allocation in interrupt context).

 But I don't know if I understand the problem the commit set out to
 solves, which is why I have no proposed patch here.
>>>
>>> Just to be clear, the issues found with both commits are completely
>>> unrelated one from the other, right?
>>
>> No, they do interact and will most likely affect each other.
>>
>> But this would not be an issue if commit 833415a3e781 didn't cause
>> unexpected errors, so that commit must be reverted in any case.
>
> Alright, so assuming reversions are the way to go forward,
> how would you prefer to go about cleaning this up?

I've alreasy sent a revert for commit 833415a3e781, but it might have
been timed badly related to the soon opening merge window.  See
https://www.spinics.net/lists/linux-usb/msg156311.html

Any further commits should probably be based on top of this, assuming
that it will be accepted.  But if it was sent too late for Greg to
include it in his v4.12-rc1 batch, then I guess there is no real
hurry...

In any case, if you are going for a revert then I suggest using the same
stable CC to avoid the v4.9 LTS branch diverging unneceessarily.



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-24 Thread Robert Foss



On 2017-04-24 08:02 AM, Bjørn Mork wrote:

Aleksander Morgado  writes:

On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork  wrote:

Sorry for being much too late here, but during recent attemts to debug
issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
to missing notifications") I believe I found a couple of issues with
commit c1da59dad0eb. At least one of them is serious (potentional
GPF_KERNEL allocation in interrupt context).

But I don't know if I understand the problem the commit set out to
solves, which is why I have no proposed patch here.


Just to be clear, the issues found with both commits are completely
unrelated one from the other, right?


No, they do interact and will most likely affect each other.

But this would not be an issue if commit 833415a3e781 didn't cause
unexpected errors, so that commit must be reverted in any case.


Alright, so assuming reversions are the way to go forward,
how would you prefer to go about cleaning this up?


Rob.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-24 Thread Bjørn Mork
Aleksander Morgado  writes:
> On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork  wrote:
>> Sorry for being much too late here, but during recent attemts to debug
>> issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
>> to missing notifications") I believe I found a couple of issues with
>> commit c1da59dad0eb. At least one of them is serious (potentional
>> GPF_KERNEL allocation in interrupt context).
>>
>> But I don't know if I understand the problem the commit set out to
>> solves, which is why I have no proposed patch here.
>
> Just to be clear, the issues found with both commits are completely
> unrelated one from the other, right?

No, they do interact and will most likely affect each other.  

But this would not be an issue if commit 833415a3e781 didn't cause
unexpected errors, so that commit must be reverted in any case.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-24 Thread Aleksander Morgado
Hey Bjørn,

On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork  wrote:
> Sorry for being much too late here, but during recent attemts to debug
> issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
> to missing notifications") I believe I found a couple of issues with
> commit c1da59dad0eb. At least one of them is serious (potentional
> GPF_KERNEL allocation in interrupt context).
>
> But I don't know if I understand the problem the commit set out to
> solves, which is why I have no proposed patch here.

Just to be clear, the issues found with both commits are completely
unrelated one from the other, right?

-- 
Aleksander
https://aleksander.es
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-21 Thread Guenter Roeck
On Thu, Apr 20, 2017 at 2:18 PM, Robert Foss  wrote:
> Hi Björn,
>
> Thanks for the thorough and explicit feedback, it was rather helpful.
>
>
> On 2017-04-20 04:32 AM, Bjørn Mork wrote:
>>
>> Hello Robert,
>>
>> Sorry for being much too late here, but during recent attemts to debug
>> issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
>> to missing notifications") I believe I found a couple of issues with
>> commit c1da59dad0eb. At least one of them is serious (potentional
>> GPF_KERNEL allocation in interrupt context).
>>
>> But I don't know if I understand the problem the commit set out to
>> solves, which is why I have no proposed patch here.
>
>
>
> The description of the situation that I received was then one
> from the commit message:
>
> "In case of a read error, userspace may not actually read the data, since
> the
> driver returns an error through wdm_poll. After this, the underlying device
> may
> attempt to send us more data, but the queue is not processed. While
> userspace is
> also blocked, because the read error is never cleared."
>
> So the way I interpret it is that after an error condition the device can
> get prevented from sending data, which in turn can cause userspace to not
> issue a read (which would have cleared the error).
>
> I could certainly be wrong in this interpretation or be misunderstanding
> something.
>
> Prathmesh, Guenter: Do you have any comments?
>

I would not want to claim to fully understand the problem, but from
Bjørn's comments it really looks like the patch should be reverted.

Thanks,
Guenter

>>
>>
>>> @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
>>> }
>>> }
>>>
>>> -   desc->rerr = status;
>>> +   /*
>>> +* only set a new error if there is no previous error.
>>> +* Errors are only cleared during read/open
>>> +*/
>>> +   if (desc->rerr  == 0)
>>> +   desc->rerr = status;
>>> +
>>> if (length + desc->length > desc->wMaxCommand) {
>>> /* The buffer would overflow */
>>> set_bit(WDM_OVERFLOW, >flags);
>>>
>>
>>
>> This is minor, but in my opinion this change is flawed.  It changes the
>> meaning of "desc->rerr" from the original "last status code" to the new
>> "first error since last userspace read".
>>
>> wdm_read will only return and clear such errors if desc->length == 0 on
>> entry.  This is up to the userspace application to decide. I claim that
>> any error should either be reported immediately or not at all.  Caching
>> errors forever like this, and then reporting them to userspace at some
>> arbitrary later event, is pointless.
>>
>> The existing code did the correct thing in the absence of proper event
>> queing.  The only real alternative would be to completely stop
>> processing device events until userspace has seen the error.  But that
>> would just make things fail completely for no good reason.
>>
>> We should probably consider making a queue/list of the read buffers and
>> keep each status code with the associated list element.  Then we could
>> report back the error when userspace reads the element that caused it.
>
>
> I see your point about changing the semantics of desc->rerr, and agree that
> it probably isn't desirable.
>
> So an event queue is starting to look like the way forward.
>
>>
>>> @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb)
>>> }
>>>
>>>  skip_error:
>>> +   set_bit(WDM_READ, >flags);
>>> wake_up(>wait);
>>>
>>> -   set_bit(WDM_READ, >flags);
>>> +   if (desc->rerr) {
>>> +   /*
>>> +* Since there was an error, userspace may decide to not
>>> read
>>> +* any data after poll'ing.
>>> +* We should respond to further attempts from the device
>>> to send
>>> +* data, so that we can get unstuck.
>>> +*/
>>> +   service_outstanding_interrupt(desc);
>>> +   }
>>> +
>>>  unlock:
>>> spin_unlock(>iuspin);
>>>   }
>>
>>
>> We cannot do this.  This is an URB callback, and
>> service_outstanding_interrupt()
>> calls  usb_submit_urb(...,GFP_KERNEL).
>>
>> This issue could of course be avoided by simply changing the allocation
>> flags. But looking at the comment here, I am pretty sure that this is
>> the wrong solution to the unanticipated userspace behaviour.  If
>> userspace decides not to read() if poll() sets POLLERR, then it *should
>> not* expect the error condition to be cleared. Or am I missing something?
>
>
> The allocation flags aside, calling poll() and it returning POLLERR should
> not clear the error, so the behavior of the
>
>>
>> In any case: This will not unstick anything, given the previous problem
>> with desc->rerr not being updated unless userspace reads.  You will just
>> flush the device queue, but the error condition is still there. So
>> poll() will still set POLLERR.  If 

Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-20 Thread Robert Foss

Hi Björn,

Thanks for the thorough and explicit feedback, it was rather helpful.


On 2017-04-20 04:32 AM, Bjørn Mork wrote:

Hello Robert,

Sorry for being much too late here, but during recent attemts to debug
issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
to missing notifications") I believe I found a couple of issues with
commit c1da59dad0eb. At least one of them is serious (potentional
GPF_KERNEL allocation in interrupt context).

But I don't know if I understand the problem the commit set out to
solves, which is why I have no proposed patch here.



The description of the situation that I received was then one
from the commit message:

"In case of a read error, userspace may not actually read the data, since the
driver returns an error through wdm_poll. After this, the underlying device may
attempt to send us more data, but the queue is not processed. While userspace is
also blocked, because the read error is never cleared."

So the way I interpret it is that after an error condition the device can get 
prevented from sending data, which in turn can cause userspace to not issue a 
read (which would have cleared the error).


I could certainly be wrong in this interpretation or be misunderstanding 
something.

Prathmesh, Guenter: Do you have any comments?





@@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
}
}

-   desc->rerr = status;
+   /*
+* only set a new error if there is no previous error.
+* Errors are only cleared during read/open
+*/
+   if (desc->rerr  == 0)
+   desc->rerr = status;
+
if (length + desc->length > desc->wMaxCommand) {
/* The buffer would overflow */
set_bit(WDM_OVERFLOW, >flags);




This is minor, but in my opinion this change is flawed.  It changes the
meaning of "desc->rerr" from the original "last status code" to the new
"first error since last userspace read".

wdm_read will only return and clear such errors if desc->length == 0 on
entry.  This is up to the userspace application to decide. I claim that
any error should either be reported immediately or not at all.  Caching
errors forever like this, and then reporting them to userspace at some
arbitrary later event, is pointless.

The existing code did the correct thing in the absence of proper event
queing.  The only real alternative would be to completely stop
processing device events until userspace has seen the error.  But that
would just make things fail completely for no good reason.

We should probably consider making a queue/list of the read buffers and
keep each status code with the associated list element.  Then we could
report back the error when userspace reads the element that caused it.


I see your point about changing the semantics of desc->rerr, and agree that it 
probably isn't desirable.


So an event queue is starting to look like the way forward.




@@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb)
}

 skip_error:
+   set_bit(WDM_READ, >flags);
wake_up(>wait);

-   set_bit(WDM_READ, >flags);
+   if (desc->rerr) {
+   /*
+* Since there was an error, userspace may decide to not read
+* any data after poll'ing.
+* We should respond to further attempts from the device to send
+* data, so that we can get unstuck.
+*/
+   service_outstanding_interrupt(desc);
+   }
+
 unlock:
spin_unlock(>iuspin);
  }


We cannot do this.  This is an URB callback, and service_outstanding_interrupt()
calls  usb_submit_urb(...,GFP_KERNEL).

This issue could of course be avoided by simply changing the allocation
flags. But looking at the comment here, I am pretty sure that this is
the wrong solution to the unanticipated userspace behaviour.  If
userspace decides not to read() if poll() sets POLLERR, then it *should
not* expect the error condition to be cleared. Or am I missing something?


The allocation flags aside, calling poll() and it returning POLLERR should
not clear the error, so the behavior of the



In any case: This will not unstick anything, given the previous problem
with desc->rerr not being updated unless userspace reads.  You will just
flush the device queue, but the error condition is still there. So
poll() will still set POLLERR.  If the comment is correct, this is a
permanent deadlock.

userspace can avoid tge issue by reading the descriptor on POLLERR. If
that is not the way this is expected to work (I am by no means a POSIX
expert), then I believe the correct fix must be to change wdm_poll(),
either to clear the error condition or to not set POLLERR in this case.


So, in lieu of an actual event queue, maybe this patch should be reverted.
Thoughts?


Rob.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo 

issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

2017-04-20 Thread Bjørn Mork
Hello Robert,

Sorry for being much too late here, but during recent attemts to debug
issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
to missing notifications") I believe I found a couple of issues with
commit c1da59dad0eb. At least one of them is serious (potentional
GPF_KERNEL allocation in interrupt context).

But I don't know if I understand the problem the commit set out to
solves, which is why I have no proposed patch here.


> @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
>   }
>   }
>  
> - desc->rerr = status;
> + /*
> +  * only set a new error if there is no previous error.
> +  * Errors are only cleared during read/open
> +  */
> + if (desc->rerr  == 0)
> + desc->rerr = status;
> +
>   if (length + desc->length > desc->wMaxCommand) {
>   /* The buffer would overflow */
>   set_bit(WDM_OVERFLOW, >flags);
> 


This is minor, but in my opinion this change is flawed.  It changes the
meaning of "desc->rerr" from the original "last status code" to the new
"first error since last userspace read".

wdm_read will only return and clear such errors if desc->length == 0 on
entry.  This is up to the userspace application to decide. I claim that
any error should either be reported immediately or not at all.  Caching
errors forever like this, and then reporting them to userspace at some
arbitrary later event, is pointless. 

The existing code did the correct thing in the absence of proper event
queing.  The only real alternative would be to completely stop
processing device events until userspace has seen the error.  But that
would just make things fail completely for no good reason.

We should probably consider making a queue/list of the read buffers and
keep each status code with the associated list element.  Then we could
report back the error when userspace reads the element that caused it.

> @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb)
>   }
>  
>  skip_error:
> + set_bit(WDM_READ, >flags);
>   wake_up(>wait);
>  
> - set_bit(WDM_READ, >flags);
> + if (desc->rerr) {
> + /*
> +  * Since there was an error, userspace may decide to not read
> +  * any data after poll'ing.
> +  * We should respond to further attempts from the device to send
> +  * data, so that we can get unstuck.
> +  */
> + service_outstanding_interrupt(desc);
> + }
> +
>  unlock:
>   spin_unlock(>iuspin);
>   }

We cannot do this.  This is an URB callback, and service_outstanding_interrupt()
calls  usb_submit_urb(...,GFP_KERNEL).

This issue could of course be avoided by simply changing the allocation
flags. But looking at the comment here, I am pretty sure that this is
the wrong solution to the unanticipated userspace behaviour.  If
userspace decides not to read() if poll() sets POLLERR, then it *should
not* expect the error condition to be cleared. Or am I missing something?

In any case: This will not unstick anything, given the previous problem
with desc->rerr not being updated unless userspace reads.  You will just
flush the device queue, but the error condition is still there. So
poll() will still set POLLERR.  If the comment is correct, this is a
permanent deadlock.

userspace can avoid tge issue by reading the descriptor on POLLERR. If
that is not the way this is expected to work (I am by no means a POSIX
expert), then I believe the correct fix must be to change wdm_poll(),
either to clear the error condition or to not set POLLERR in this case.




Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html