Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Am Dienstag, den 25.04.2017, 13:07 +0200 schrieb Bjørn Mork: > > No, I am not sure. But I don't see how we can both cache the error and > continue to read new data. The error is irrelevant after the next URB > has completed. If we decide that every error should be delivered to > userspace, then we either have to stop transmission or maintain a queue > of device reads, keeping data and errors in sync. Delivering an old > error to userspace along with newer data makes no sense. > > Also note that this code path is skipped entirely for several common > errors. So they are never reported to userspace. Hi, OK. First my apology. I should have asked for clarification before accepting this patch. I have learned from this. Second, as your code expects the old behavior, a revert is the correct course of action for now. Third, if I approach your argument from a principal view point, I see a couple of issues. Primarily, how important is the error code at all? And secondly it seems to me that a device reset must be reported at all cost. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Oliver Neukumwrites: > Am Donnerstag, den 20.04.2017, 10:32 +0200 schrieb Bjørn Mork: >> > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) >> > } >> > } >> > >> > - desc->rerr = status; >> > + /* >> > + * only set a new error if there is no previous error. >> > + * Errors are only cleared during read/open >> > + */ >> > + if (desc->rerr == 0) >> > + desc->rerr = status; >> > + >> > if (length + desc->length > desc->wMaxCommand) { >> > /* The buffer would overflow */ >> > set_bit(WDM_OVERFLOW, >flags); >> > >> >> >> This is minor, but in my opinion this change is flawed. It changes the >> meaning of "desc->rerr" from the original "last status code" to the new >> "first error since last userspace read". > > Indeed. And that will preserve the initial cause of the problem better. > We could run into the zero length read case. Are you sure you want the > old behavior back? No, I am not sure. But I don't see how we can both cache the error and continue to read new data. The error is irrelevant after the next URB has completed. If we decide that every error should be delivered to userspace, then we either have to stop transmission or maintain a queue of device reads, keeping data and errors in sync. Delivering an old error to userspace along with newer data makes no sense. Also note that this code path is skipped entirely for several common errors. So they are never reported to userspace. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Am Donnerstag, den 20.04.2017, 10:32 +0200 schrieb Bjørn Mork: > > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) > > } > > } > > > > - desc->rerr = status; > > + /* > > + * only set a new error if there is no previous error. > > + * Errors are only cleared during read/open > > + */ > > + if (desc->rerr == 0) > > + desc->rerr = status; > > + > > if (length + desc->length > desc->wMaxCommand) { > > /* The buffer would overflow */ > > set_bit(WDM_OVERFLOW, >flags); > > > > > This is minor, but in my opinion this change is flawed. It changes the > meaning of "desc->rerr" from the original "last status code" to the new > "first error since last userspace read". Indeed. And that will preserve the initial cause of the problem better. We could run into the zero length read case. Are you sure you want the old behavior back? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Robert Fosswrites: > On 2017-04-24 08:02 AM, Bjørn Mork wrote: >> Aleksander Morgado writes: >>> On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork wrote: Sorry for being much too late here, but during recent attemts to debug issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due to missing notifications") I believe I found a couple of issues with commit c1da59dad0eb. At least one of them is serious (potentional GPF_KERNEL allocation in interrupt context). But I don't know if I understand the problem the commit set out to solves, which is why I have no proposed patch here. >>> >>> Just to be clear, the issues found with both commits are completely >>> unrelated one from the other, right? >> >> No, they do interact and will most likely affect each other. >> >> But this would not be an issue if commit 833415a3e781 didn't cause >> unexpected errors, so that commit must be reverted in any case. > > Alright, so assuming reversions are the way to go forward, > how would you prefer to go about cleaning this up? I've alreasy sent a revert for commit 833415a3e781, but it might have been timed badly related to the soon opening merge window. See https://www.spinics.net/lists/linux-usb/msg156311.html Any further commits should probably be based on top of this, assuming that it will be accepted. But if it was sent too late for Greg to include it in his v4.12-rc1 batch, then I guess there is no real hurry... In any case, if you are going for a revert then I suggest using the same stable CC to avoid the v4.9 LTS branch diverging unneceessarily. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
On 2017-04-24 08:02 AM, Bjørn Mork wrote: Aleksander Morgadowrites: On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork wrote: Sorry for being much too late here, but during recent attemts to debug issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due to missing notifications") I believe I found a couple of issues with commit c1da59dad0eb. At least one of them is serious (potentional GPF_KERNEL allocation in interrupt context). But I don't know if I understand the problem the commit set out to solves, which is why I have no proposed patch here. Just to be clear, the issues found with both commits are completely unrelated one from the other, right? No, they do interact and will most likely affect each other. But this would not be an issue if commit 833415a3e781 didn't cause unexpected errors, so that commit must be reverted in any case. Alright, so assuming reversions are the way to go forward, how would you prefer to go about cleaning this up? Rob. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Aleksander Morgadowrites: > On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Mork wrote: >> Sorry for being much too late here, but during recent attemts to debug >> issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due >> to missing notifications") I believe I found a couple of issues with >> commit c1da59dad0eb. At least one of them is serious (potentional >> GPF_KERNEL allocation in interrupt context). >> >> But I don't know if I understand the problem the commit set out to >> solves, which is why I have no proposed patch here. > > Just to be clear, the issues found with both commits are completely > unrelated one from the other, right? No, they do interact and will most likely affect each other. But this would not be an issue if commit 833415a3e781 didn't cause unexpected errors, so that commit must be reverted in any case. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Hey Bjørn, On Thu, Apr 20, 2017 at 10:32 AM, Bjørn Morkwrote: > Sorry for being much too late here, but during recent attemts to debug > issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due > to missing notifications") I believe I found a couple of issues with > commit c1da59dad0eb. At least one of them is serious (potentional > GPF_KERNEL allocation in interrupt context). > > But I don't know if I understand the problem the commit set out to > solves, which is why I have no proposed patch here. Just to be clear, the issues found with both commits are completely unrelated one from the other, right? -- Aleksander https://aleksander.es -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
On Thu, Apr 20, 2017 at 2:18 PM, Robert Fosswrote: > Hi Björn, > > Thanks for the thorough and explicit feedback, it was rather helpful. > > > On 2017-04-20 04:32 AM, Bjørn Mork wrote: >> >> Hello Robert, >> >> Sorry for being much too late here, but during recent attemts to debug >> issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due >> to missing notifications") I believe I found a couple of issues with >> commit c1da59dad0eb. At least one of them is serious (potentional >> GPF_KERNEL allocation in interrupt context). >> >> But I don't know if I understand the problem the commit set out to >> solves, which is why I have no proposed patch here. > > > > The description of the situation that I received was then one > from the commit message: > > "In case of a read error, userspace may not actually read the data, since > the > driver returns an error through wdm_poll. After this, the underlying device > may > attempt to send us more data, but the queue is not processed. While > userspace is > also blocked, because the read error is never cleared." > > So the way I interpret it is that after an error condition the device can > get prevented from sending data, which in turn can cause userspace to not > issue a read (which would have cleared the error). > > I could certainly be wrong in this interpretation or be misunderstanding > something. > > Prathmesh, Guenter: Do you have any comments? > I would not want to claim to fully understand the problem, but from Bjørn's comments it really looks like the patch should be reverted. Thanks, Guenter >> >> >>> @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) >>> } >>> } >>> >>> - desc->rerr = status; >>> + /* >>> +* only set a new error if there is no previous error. >>> +* Errors are only cleared during read/open >>> +*/ >>> + if (desc->rerr == 0) >>> + desc->rerr = status; >>> + >>> if (length + desc->length > desc->wMaxCommand) { >>> /* The buffer would overflow */ >>> set_bit(WDM_OVERFLOW, >flags); >>> >> >> >> This is minor, but in my opinion this change is flawed. It changes the >> meaning of "desc->rerr" from the original "last status code" to the new >> "first error since last userspace read". >> >> wdm_read will only return and clear such errors if desc->length == 0 on >> entry. This is up to the userspace application to decide. I claim that >> any error should either be reported immediately or not at all. Caching >> errors forever like this, and then reporting them to userspace at some >> arbitrary later event, is pointless. >> >> The existing code did the correct thing in the absence of proper event >> queing. The only real alternative would be to completely stop >> processing device events until userspace has seen the error. But that >> would just make things fail completely for no good reason. >> >> We should probably consider making a queue/list of the read buffers and >> keep each status code with the associated list element. Then we could >> report back the error when userspace reads the element that caused it. > > > I see your point about changing the semantics of desc->rerr, and agree that > it probably isn't desirable. > > So an event queue is starting to look like the way forward. > >> >>> @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb) >>> } >>> >>> skip_error: >>> + set_bit(WDM_READ, >flags); >>> wake_up(>wait); >>> >>> - set_bit(WDM_READ, >flags); >>> + if (desc->rerr) { >>> + /* >>> +* Since there was an error, userspace may decide to not >>> read >>> +* any data after poll'ing. >>> +* We should respond to further attempts from the device >>> to send >>> +* data, so that we can get unstuck. >>> +*/ >>> + service_outstanding_interrupt(desc); >>> + } >>> + >>> unlock: >>> spin_unlock(>iuspin); >>> } >> >> >> We cannot do this. This is an URB callback, and >> service_outstanding_interrupt() >> calls usb_submit_urb(...,GFP_KERNEL). >> >> This issue could of course be avoided by simply changing the allocation >> flags. But looking at the comment here, I am pretty sure that this is >> the wrong solution to the unanticipated userspace behaviour. If >> userspace decides not to read() if poll() sets POLLERR, then it *should >> not* expect the error condition to be cleared. Or am I missing something? > > > The allocation flags aside, calling poll() and it returning POLLERR should > not clear the error, so the behavior of the > >> >> In any case: This will not unstick anything, given the previous problem >> with desc->rerr not being updated unless userspace reads. You will just >> flush the device queue, but the error condition is still there. So >> poll() will still set POLLERR. If
Re: issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Hi Björn, Thanks for the thorough and explicit feedback, it was rather helpful. On 2017-04-20 04:32 AM, Bjørn Mork wrote: Hello Robert, Sorry for being much too late here, but during recent attemts to debug issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due to missing notifications") I believe I found a couple of issues with commit c1da59dad0eb. At least one of them is serious (potentional GPF_KERNEL allocation in interrupt context). But I don't know if I understand the problem the commit set out to solves, which is why I have no proposed patch here. The description of the situation that I received was then one from the commit message: "In case of a read error, userspace may not actually read the data, since the driver returns an error through wdm_poll. After this, the underlying device may attempt to send us more data, but the queue is not processed. While userspace is also blocked, because the read error is never cleared." So the way I interpret it is that after an error condition the device can get prevented from sending data, which in turn can cause userspace to not issue a read (which would have cleared the error). I could certainly be wrong in this interpretation or be misunderstanding something. Prathmesh, Guenter: Do you have any comments? @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) } } - desc->rerr = status; + /* +* only set a new error if there is no previous error. +* Errors are only cleared during read/open +*/ + if (desc->rerr == 0) + desc->rerr = status; + if (length + desc->length > desc->wMaxCommand) { /* The buffer would overflow */ set_bit(WDM_OVERFLOW, >flags); This is minor, but in my opinion this change is flawed. It changes the meaning of "desc->rerr" from the original "last status code" to the new "first error since last userspace read". wdm_read will only return and clear such errors if desc->length == 0 on entry. This is up to the userspace application to decide. I claim that any error should either be reported immediately or not at all. Caching errors forever like this, and then reporting them to userspace at some arbitrary later event, is pointless. The existing code did the correct thing in the absence of proper event queing. The only real alternative would be to completely stop processing device events until userspace has seen the error. But that would just make things fail completely for no good reason. We should probably consider making a queue/list of the read buffers and keep each status code with the associated list element. Then we could report back the error when userspace reads the element that caused it. I see your point about changing the semantics of desc->rerr, and agree that it probably isn't desirable. So an event queue is starting to look like the way forward. @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb) } skip_error: + set_bit(WDM_READ, >flags); wake_up(>wait); - set_bit(WDM_READ, >flags); + if (desc->rerr) { + /* +* Since there was an error, userspace may decide to not read +* any data after poll'ing. +* We should respond to further attempts from the device to send +* data, so that we can get unstuck. +*/ + service_outstanding_interrupt(desc); + } + unlock: spin_unlock(>iuspin); } We cannot do this. This is an URB callback, and service_outstanding_interrupt() calls usb_submit_urb(...,GFP_KERNEL). This issue could of course be avoided by simply changing the allocation flags. But looking at the comment here, I am pretty sure that this is the wrong solution to the unanticipated userspace behaviour. If userspace decides not to read() if poll() sets POLLERR, then it *should not* expect the error condition to be cleared. Or am I missing something? The allocation flags aside, calling poll() and it returning POLLERR should not clear the error, so the behavior of the In any case: This will not unstick anything, given the previous problem with desc->rerr not being updated unless userspace reads. You will just flush the device queue, but the error condition is still there. So poll() will still set POLLERR. If the comment is correct, this is a permanent deadlock. userspace can avoid tge issue by reading the descriptor on POLLERR. If that is not the way this is expected to work (I am by no means a POSIX expert), then I believe the correct fix must be to change wdm_poll(), either to clear the error condition or to not set POLLERR in this case. So, in lieu of an actual event queue, maybe this patch should be reverted. Thoughts? Rob. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo
issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")
Hello Robert, Sorry for being much too late here, but during recent attemts to debug issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due to missing notifications") I believe I found a couple of issues with commit c1da59dad0eb. At least one of them is serious (potentional GPF_KERNEL allocation in interrupt context). But I don't know if I understand the problem the commit set out to solves, which is why I have no proposed patch here. > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) > } > } > > - desc->rerr = status; > + /* > + * only set a new error if there is no previous error. > + * Errors are only cleared during read/open > + */ > + if (desc->rerr == 0) > + desc->rerr = status; > + > if (length + desc->length > desc->wMaxCommand) { > /* The buffer would overflow */ > set_bit(WDM_OVERFLOW, >flags); > This is minor, but in my opinion this change is flawed. It changes the meaning of "desc->rerr" from the original "last status code" to the new "first error since last userspace read". wdm_read will only return and clear such errors if desc->length == 0 on entry. This is up to the userspace application to decide. I claim that any error should either be reported immediately or not at all. Caching errors forever like this, and then reporting them to userspace at some arbitrary later event, is pointless. The existing code did the correct thing in the absence of proper event queing. The only real alternative would be to completely stop processing device events until userspace has seen the error. But that would just make things fail completely for no good reason. We should probably consider making a queue/list of the read buffers and keep each status code with the associated list element. Then we could report back the error when userspace reads the element that caused it. > @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb) > } > > skip_error: > + set_bit(WDM_READ, >flags); > wake_up(>wait); > > - set_bit(WDM_READ, >flags); > + if (desc->rerr) { > + /* > + * Since there was an error, userspace may decide to not read > + * any data after poll'ing. > + * We should respond to further attempts from the device to send > + * data, so that we can get unstuck. > + */ > + service_outstanding_interrupt(desc); > + } > + > unlock: > spin_unlock(>iuspin); > } We cannot do this. This is an URB callback, and service_outstanding_interrupt() calls usb_submit_urb(...,GFP_KERNEL). This issue could of course be avoided by simply changing the allocation flags. But looking at the comment here, I am pretty sure that this is the wrong solution to the unanticipated userspace behaviour. If userspace decides not to read() if poll() sets POLLERR, then it *should not* expect the error condition to be cleared. Or am I missing something? In any case: This will not unstick anything, given the previous problem with desc->rerr not being updated unless userspace reads. You will just flush the device queue, but the error condition is still there. So poll() will still set POLLERR. If the comment is correct, this is a permanent deadlock. userspace can avoid tge issue by reading the descriptor on POLLERR. If that is not the way this is expected to work (I am by no means a POSIX expert), then I believe the correct fix must be to change wdm_poll(), either to clear the error condition or to not set POLLERR in this case. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html