Re: btusb "firmware request while host is not available" at resume

2017-10-05 Thread Kai-Heng Feng
Hi Luis,

On Wed, Oct 4, 2017 at 8:20 AM, Luis R. Rodriguez  wrote:
> So please retest now that the revert happened: commit
> f007cad159e99fa2acd3b2e9364fbb32ad28b971 ("Revert "firmware: add sanity check
> on shutdown/suspend").

I can confirm the issue is gone after the commit.
Also, thanks for your detailed explanation.


Kai-Heng

>
>   Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-05 Thread Kai-Heng Feng
Hi Luis,

On Wed, Oct 4, 2017 at 8:20 AM, Luis R. Rodriguez  wrote:
> So please retest now that the revert happened: commit
> f007cad159e99fa2acd3b2e9364fbb32ad28b971 ("Revert "firmware: add sanity check
> on shutdown/suspend").

I can confirm the issue is gone after the commit.
Also, thanks for your detailed explanation.


Kai-Heng

>
>   Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-03 Thread Luis R. Rodriguez
On Tue, Oct 3, 2017 at 5:20 PM, Luis R. Rodriguez  wrote:
> the ordering devised currently there is:
>
>   o device driver pm ops called
>   o notifier for suspend issued - *going to suspend*
>   o userspace frozen
>   o filesystem freeze
>
> On the way back up this order is inverted:
>
>   o filesystem freeze
>   o userspace frozen
>   o notifier for suspend issued - *going to suspend*
>   o device driver pm ops called

Fortunately I had it a tad bit wrong, but in a good way. Our ordering
on our way down is:

 o notifier for suspend issued - *going to suspend*
 o userspace frozen
 o filesystem freeze (new, being proposed)
 o device driver pm ops called

Then on our way up:

  o device driver pm ops called
  o filesystem thaw
  o userspace thaw
  o notifier for resume issued - *thawing*

So the driver callbacks get called *later*, so anything called in
notifiers do get a chance to quiesce things properly.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-03 Thread Luis R. Rodriguez
On Tue, Oct 3, 2017 at 5:20 PM, Luis R. Rodriguez  wrote:
> the ordering devised currently there is:
>
>   o device driver pm ops called
>   o notifier for suspend issued - *going to suspend*
>   o userspace frozen
>   o filesystem freeze
>
> On the way back up this order is inverted:
>
>   o filesystem freeze
>   o userspace frozen
>   o notifier for suspend issued - *going to suspend*
>   o device driver pm ops called

Fortunately I had it a tad bit wrong, but in a good way. Our ordering
on our way down is:

 o notifier for suspend issued - *going to suspend*
 o userspace frozen
 o filesystem freeze (new, being proposed)
 o device driver pm ops called

Then on our way up:

  o device driver pm ops called
  o filesystem thaw
  o userspace thaw
  o notifier for resume issued - *thawing*

So the driver callbacks get called *later*, so anything called in
notifiers do get a chance to quiesce things properly.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-03 Thread Luis R. Rodriguez
On Mon, Oct 02, 2017 at 04:34:11PM +0800, Kai-Heng Feng wrote:
> Hi Luis,
> 
> On Thu, Sep 14, 2017 at 1:39 AM, Luis R. Rodriguez  wrote:
> [snipped]
> > Would a fw_cache_hint(device, name_list) be reasonable then sometime 
> > *before*
> > suspend? All this would do is ask the firmware API to extend the fw cache
> > list with the entries. It would not load firmware immediately, instead this
> > would trigger a request for the hinted firmware to become available for
> > suspend. Then these drivers could request for firmware at resume and it
> > will pick up the cached firmware.
> 
> I think I am the author the patch [1] mentioned by Marcel. I have to admit,
> it's quite clunky in it's current form.  So yes, the new API you proposed is
> definitely better to solve the issue. I'll send new patch for btusb once we
> have the new API to use.

Note that Linus' recent revert to ensure we can keep the old behaviour
of failing to fetch firmware if we're on our way to suspend means that if
you use direct filesystem firmware fetch today on upstream kernels
you will get an attempt to actually look for the file. That is the old
failures you used to see should be gone.

Although at first I was cautious to avoid moving in this direction it was
the right direction to move in anyway long term. The old mecehanism was
put in place to avoid having the UMH call out given we are freezing
userspace and we'd deadlock as the userspace helper would already be
frozen. The UMH helpers *used* to be our default firmware fetchter,
but the API grew to just do a direct filesystem lookup.

If there are issues or races with suspend / resume those need to be
addressed at an ordering level at the core of the suspend framework.
I just actually proposed *how* to deal with proper filesystem suspend
races today [0], the ordering devised currently there is:

  o device driver pm ops called
  o notifier for suspend issued - *going to suspend*
  o userspace frozen
  o filesystem freeze

On the way back up this order is inverted:

  o filesystem freeze
  o userspace frozen
  o notifier for suspend issued - *going to suspend*
  o device driver pm ops called

With this order in place filesystems should in theory be available
on resume ASAP, any issues that could creep up implicate having
to address and review the above ordering.

As-is upstream though, things should be fine and the race you used to
see should no longer be possible. This does mean older kernels don't
have these fixes, as they were not intended to be fixes. The patch
Linus reverted was the patch I kept purposely to *avoid* us having
to move light years forward in one release, but alas, we've moved
forward. So the way to look at it, in the end, is the work to move
the UMH lock outside of direct FS lookups fixed this issue now.

[0] https://lkml.kernel.org/r/20171003185313.1017-1-mcg...@kernel.org

> Also, with patch [1], the "firmware request while host is not
> available" may still get hit
> on some corner cases. I proposed another patch [2] to tackle the edge
> case, can you
> take a look?
> 
> [1] https://lkml.org/lkml/2017/8/25/58
> [2] https://lkml.org/lkml/2017/8/22/123

All this crap should not be needed anymore as the cause of the issue,
the UMH lock on direct-fs-lookups, is long gone on direct-fs lookups.

The only case and kernel config that will run into this issue now
is those calls that want to skip the direct-fs-lookup first, and
there are only two stupid device drivers upstream that do this:
dell rbu and some LED crap driver.

So please retest now that the revert happened: commit
f007cad159e99fa2acd3b2e9364fbb32ad28b971 ("Revert "firmware: add sanity check
on shutdown/suspend").

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-03 Thread Luis R. Rodriguez
On Mon, Oct 02, 2017 at 04:34:11PM +0800, Kai-Heng Feng wrote:
> Hi Luis,
> 
> On Thu, Sep 14, 2017 at 1:39 AM, Luis R. Rodriguez  wrote:
> [snipped]
> > Would a fw_cache_hint(device, name_list) be reasonable then sometime 
> > *before*
> > suspend? All this would do is ask the firmware API to extend the fw cache
> > list with the entries. It would not load firmware immediately, instead this
> > would trigger a request for the hinted firmware to become available for
> > suspend. Then these drivers could request for firmware at resume and it
> > will pick up the cached firmware.
> 
> I think I am the author the patch [1] mentioned by Marcel. I have to admit,
> it's quite clunky in it's current form.  So yes, the new API you proposed is
> definitely better to solve the issue. I'll send new patch for btusb once we
> have the new API to use.

Note that Linus' recent revert to ensure we can keep the old behaviour
of failing to fetch firmware if we're on our way to suspend means that if
you use direct filesystem firmware fetch today on upstream kernels
you will get an attempt to actually look for the file. That is the old
failures you used to see should be gone.

Although at first I was cautious to avoid moving in this direction it was
the right direction to move in anyway long term. The old mecehanism was
put in place to avoid having the UMH call out given we are freezing
userspace and we'd deadlock as the userspace helper would already be
frozen. The UMH helpers *used* to be our default firmware fetchter,
but the API grew to just do a direct filesystem lookup.

If there are issues or races with suspend / resume those need to be
addressed at an ordering level at the core of the suspend framework.
I just actually proposed *how* to deal with proper filesystem suspend
races today [0], the ordering devised currently there is:

  o device driver pm ops called
  o notifier for suspend issued - *going to suspend*
  o userspace frozen
  o filesystem freeze

On the way back up this order is inverted:

  o filesystem freeze
  o userspace frozen
  o notifier for suspend issued - *going to suspend*
  o device driver pm ops called

With this order in place filesystems should in theory be available
on resume ASAP, any issues that could creep up implicate having
to address and review the above ordering.

As-is upstream though, things should be fine and the race you used to
see should no longer be possible. This does mean older kernels don't
have these fixes, as they were not intended to be fixes. The patch
Linus reverted was the patch I kept purposely to *avoid* us having
to move light years forward in one release, but alas, we've moved
forward. So the way to look at it, in the end, is the work to move
the UMH lock outside of direct FS lookups fixed this issue now.

[0] https://lkml.kernel.org/r/20171003185313.1017-1-mcg...@kernel.org

> Also, with patch [1], the "firmware request while host is not
> available" may still get hit
> on some corner cases. I proposed another patch [2] to tackle the edge
> case, can you
> take a look?
> 
> [1] https://lkml.org/lkml/2017/8/25/58
> [2] https://lkml.org/lkml/2017/8/22/123

All this crap should not be needed anymore as the cause of the issue,
the UMH lock on direct-fs-lookups, is long gone on direct-fs lookups.

The only case and kernel config that will run into this issue now
is those calls that want to skip the direct-fs-lookup first, and
there are only two stupid device drivers upstream that do this:
dell rbu and some LED crap driver.

So please retest now that the revert happened: commit
f007cad159e99fa2acd3b2e9364fbb32ad28b971 ("Revert "firmware: add sanity check
on shutdown/suspend").

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-10-02 Thread Kai-Heng Feng
Hi Luis,

On Thu, Sep 14, 2017 at 1:39 AM, Luis R. Rodriguez  wrote:
[snipped]
> Would a fw_cache_hint(device, name_list) be reasonable then sometime *before*
> suspend? All this would do is ask the firmware API to extend the fw cache
> list with the entries. It would not load firmware immediately, instead this
> would trigger a request for the hinted firmware to become available for
> suspend. Then these drivers could request for firmware at resume and it
> will pick up the cached firmware.

I think I am the author the patch [1] mentioned by Marcel. I have to
admit, it's quite
clunky in it's current form.
So yes, the new API you proposed is definitely better to solve the
issue. I'll send
new patch for btusb once we have the new API to use.

Also, with patch [1], the "firmware request while host is not
available" may still get hit
on some corner cases. I proposed another patch [2] to tackle the edge
case, can you
take a look?

[1] https://lkml.org/lkml/2017/8/25/58
[2] https://lkml.org/lkml/2017/8/22/123

Kai-Heng

>
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btusb "firmware request while host is not available" at resume

2017-10-02 Thread Kai-Heng Feng
Hi Luis,

On Thu, Sep 14, 2017 at 1:39 AM, Luis R. Rodriguez  wrote:
[snipped]
> Would a fw_cache_hint(device, name_list) be reasonable then sometime *before*
> suspend? All this would do is ask the firmware API to extend the fw cache
> list with the entries. It would not load firmware immediately, instead this
> would trigger a request for the hinted firmware to become available for
> suspend. Then these drivers could request for firmware at resume and it
> will pick up the cached firmware.

I think I am the author the patch [1] mentioned by Marcel. I have to
admit, it's quite
clunky in it's current form.
So yes, the new API you proposed is definitely better to solve the
issue. I'll send
new patch for btusb once we have the new API to use.

Also, with patch [1], the "firmware request while host is not
available" may still get hit
on some corner cases. I proposed another patch [2] to tackle the edge
case, can you
take a look?

[1] https://lkml.org/lkml/2017/8/25/58
[2] https://lkml.org/lkml/2017/8/22/123

Kai-Heng

>
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btusb "firmware request while host is not available" at resume

2017-09-13 Thread Luis R. Rodriguez
On Wed, Sep 13, 2017 at 08:52:15AM +0200, Marcel Holtmann wrote:
> > I checked and prior to commit 81f95076281f ("firmware: add sanity check on
> > shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
> > into the umh code") I believe we could end up also failing at a firmware
> > request as well. Can you confirm? I see one bug report which confirms this
> > since v3.13:
> > 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076

Have you seen these sorts of issues before BTW?

> >> So yes, we would have to redo parts of the vendor specific handling to 
> >> always
> >> request the firmware, even if we do not need it right now. And frankly that
> >> is not obvious to anybody.
> > 
> > I agree. I thought a bit about the above and tried co come to a clever 
> > resolution,
> > for instance one resolution could be to use the MODULE_FIRMWARE() 
> > declarations
> > of sorts to then let the firmware API do the pre-fetching for you at init. 
> > This
> > would just require a 1 line change to drivers and some already have this.
> > There are two issues with this though, one is that the firmware cache works 
> > with
> > devres and devres requires a device already present. Second is firmware 
> > names
> > can be very dynamic, so one can only know the firmware name later at boot.
> 
> I don’t like that idea since we have drivers like btusb.ko that support
> devices from many manufactures. So the only difference is the firmware
> loading and setup. After that they behave all the same. So pre-loading all
> Bluetooth firmwares is pretty crazy.

Makes sense. My point was also to explain how the dynamic aspect of the strings
for firmware make that approach not practical.

> > Another issues is that most driver developers use the firmware API and 
> > without
> > knowing *are* creating a firmware cache entry, whereas a few times they 
> > don't
> > need a firmware cache entry. This is more of performance / optimization 
> > thing,
> > but something to consider long term as well.
> > 
> >> We seem to have some patches for doing exactly
> >> that, but I have not gotten to review them in detail since they deal with
> >> vendor specific complex setup handling.

Do you happen to know if some of this code implemented because of the obscure
and perhaps random issues as noted in the above bug URL. If you are not sure,
who can I ask to see what the original motivation was?

> > Correct me if I'm wrong but some of this work very likely came from the 
> > above
> > old errors, not the new ones?
> > 
> >> Also this affects more than just Intel since all hardware where firmware
> >> loading is skipped if there is already firmware loaded are affected.
> > 
> > Right, its a core firmware API issue, but this issue has been present for a
> > while, it was however in different form. I'm glad we're able to think ahead
> > and address this now though, it would seem there are quite a bit of 
> > intrusive
> > changes required to use the API right, I'd hope we could help on the 
> > firmware
> > API front to make these changes smaller.
> > 
> > First, I'd like to understand a bit more how a device ends up with firmware
> > loaded, without having gone through the firmware API.
> > 
> > Second, while requesting firmware at probe seems not necessary, would it
> > at least be reasonable to require a struct device and a list of possible
> > firmware that could be used be made available? If so a separate hook could
> > be provided to help pre-load these only onto the firmware cache. Ie, we 
> > would
> > not actually look for the firmware but just create a firmware cache instance
> > so that when we suspend we *do* go fetch for these so that they are ready 
> > and
> > available upon resume.
> 
> The devices come with boatloader mode and operational mode and both are
> independent. And most of them only fallback when fully power cycling the
> device. So a simple boot linux and reboot linux would get you into the
> situation that the firmware is already loaded. There are also some UEFI
> bioses that can load the firmware. So there are multitudes of possibilities
> where we can end up with a firmware loaded.

I see thanks.

Would a fw_cache_hint(device, name_list) be reasonable then sometime *before*
suspend? All this would do is ask the firmware API to extend the fw cache
list with the entries. It would not load firmware immediately, instead this
would trigger a request for the hinted firmware to become available for
suspend. Then these drivers could request for firmware at resume and it
will pick up the cached firmware.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-13 Thread Luis R. Rodriguez
On Wed, Sep 13, 2017 at 08:52:15AM +0200, Marcel Holtmann wrote:
> > I checked and prior to commit 81f95076281f ("firmware: add sanity check on
> > shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
> > into the umh code") I believe we could end up also failing at a firmware
> > request as well. Can you confirm? I see one bug report which confirms this
> > since v3.13:
> > 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076

Have you seen these sorts of issues before BTW?

> >> So yes, we would have to redo parts of the vendor specific handling to 
> >> always
> >> request the firmware, even if we do not need it right now. And frankly that
> >> is not obvious to anybody.
> > 
> > I agree. I thought a bit about the above and tried co come to a clever 
> > resolution,
> > for instance one resolution could be to use the MODULE_FIRMWARE() 
> > declarations
> > of sorts to then let the firmware API do the pre-fetching for you at init. 
> > This
> > would just require a 1 line change to drivers and some already have this.
> > There are two issues with this though, one is that the firmware cache works 
> > with
> > devres and devres requires a device already present. Second is firmware 
> > names
> > can be very dynamic, so one can only know the firmware name later at boot.
> 
> I don’t like that idea since we have drivers like btusb.ko that support
> devices from many manufactures. So the only difference is the firmware
> loading and setup. After that they behave all the same. So pre-loading all
> Bluetooth firmwares is pretty crazy.

Makes sense. My point was also to explain how the dynamic aspect of the strings
for firmware make that approach not practical.

> > Another issues is that most driver developers use the firmware API and 
> > without
> > knowing *are* creating a firmware cache entry, whereas a few times they 
> > don't
> > need a firmware cache entry. This is more of performance / optimization 
> > thing,
> > but something to consider long term as well.
> > 
> >> We seem to have some patches for doing exactly
> >> that, but I have not gotten to review them in detail since they deal with
> >> vendor specific complex setup handling.

Do you happen to know if some of this code implemented because of the obscure
and perhaps random issues as noted in the above bug URL. If you are not sure,
who can I ask to see what the original motivation was?

> > Correct me if I'm wrong but some of this work very likely came from the 
> > above
> > old errors, not the new ones?
> > 
> >> Also this affects more than just Intel since all hardware where firmware
> >> loading is skipped if there is already firmware loaded are affected.
> > 
> > Right, its a core firmware API issue, but this issue has been present for a
> > while, it was however in different form. I'm glad we're able to think ahead
> > and address this now though, it would seem there are quite a bit of 
> > intrusive
> > changes required to use the API right, I'd hope we could help on the 
> > firmware
> > API front to make these changes smaller.
> > 
> > First, I'd like to understand a bit more how a device ends up with firmware
> > loaded, without having gone through the firmware API.
> > 
> > Second, while requesting firmware at probe seems not necessary, would it
> > at least be reasonable to require a struct device and a list of possible
> > firmware that could be used be made available? If so a separate hook could
> > be provided to help pre-load these only onto the firmware cache. Ie, we 
> > would
> > not actually look for the firmware but just create a firmware cache instance
> > so that when we suspend we *do* go fetch for these so that they are ready 
> > and
> > available upon resume.
> 
> The devices come with boatloader mode and operational mode and both are
> independent. And most of them only fallback when fully power cycling the
> device. So a simple boot linux and reboot linux would get you into the
> situation that the firmware is already loaded. There are also some UEFI
> bioses that can load the firmware. So there are multitudes of possibilities
> where we can end up with a firmware loaded.

I see thanks.

Would a fw_cache_hint(device, name_list) be reasonable then sometime *before*
suspend? All this would do is ask the firmware API to extend the fw cache
list with the entries. It would not load firmware immediately, instead this
would trigger a request for the hinted firmware to become available for
suspend. Then these drivers could request for firmware at resume and it
will pick up the cached firmware.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-13 Thread Marcel Holtmann
Hi Luis,

 If something needs to be fixed, can you make a patch showing that?  Or
 do we also need to revert anything else as well to get back to a "better
 working" state?
>>> 
>>> I took a look at the driver and it seems that btusb_setup_intel_new() is
>>> not called after the driver is initialized, rather its called only when
>>> hci_dev_do_open() is called. Its not clear to me how you can end up calling
>>> this on resume but not prior to this on a running system. Feedback from
>>> someone more familiar with bt would be useful.
>>> 
>>> I'd have the call for firmware on probe, no processing would be needed, just
>>> a load to kick the cache into effect would suffice. This may require a bit
>>> of code shift so its best someone more familiar do this.
>>> 
>>> If it confirmed this information is helping avoid these races we can 
>>> reconsider
>>> re-instating the warn as a firmware dev debugging aid for developers.
>>> 
>>> If the race this warning complained about is indeed possible the same race 
>>> is
>>> also possible for other usermode helpers. Its *why* the UMH lock was
>>> implemented, it however was never generalized.
>> 
>> we can not load firmware on probe() in most cases. The btusb.ko driver
>> establishes the HCI transport. It is setup in probe() and then started in
>> hci_dev_do_open() and if there is a vendor setup routine like
>> btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not
>> all, but most) are doing firmware loading over the HCI transport with vendor
>> specific commands.
>> 
>> And yes, if the firmware was already loaded, we would skip requesting it at
>> all. Which means after suspend/resume cycle where the power to the controller
>> is cut, then we are missing the firmware from the cache. Since we never
>> loaded it in the first place.
> 
> I checked and prior to commit 81f95076281f ("firmware: add sanity check on
> shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
> into the umh code") I believe we could end up also failing at a firmware
> request as well. Can you confirm? I see one bug report which confirms this
> since v3.13:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076
> 
> For the sync case we had:
> 
> -   ret = usermodehelper_read_trylock();
> -   if (WARN_ON(ret)) {
> -   dev_err(device, "firmware: %s will not be loaded\n",
> -   name);
> -   goto out;
> -   }
> 
> The warning splat in launchpad bug 1356076 is the above case.
> 
> For the async case we had:
> 
> -   timeout = usermodehelper_read_lock_wait(timeout);
> -   if (!timeout) {
> -   dev_dbg(device, "firmware: %s loading timed out\n",
> -   name);
> -   ret = -EBUSY;
> -   goto out;
> 
> This would be the more silent case.
> 
> If this is accurate than the error case has just been modified to be
> a bit clearer, and without being implicated by the UMH lock. This was
> the design change after all. Nothing more *broken* should have happened.
> 
> In other words if fixing your issues goes away by reverting commit
> 81f95076281f, to be fair you should also try reverting commit 06a45a93e7d34aa
> which did the actual removal of the UMH lock when we don't use the UMH. *That*
> was previously the failure trigger point.
> 
>> So yes, we would have to redo parts of the vendor specific handling to always
>> request the firmware, even if we do not need it right now. And frankly that
>> is not obvious to anybody.
> 
> I agree. I thought a bit about the above and tried co come to a clever 
> resolution,
> for instance one resolution could be to use the MODULE_FIRMWARE() declarations
> of sorts to then let the firmware API do the pre-fetching for you at init. 
> This
> would just require a 1 line change to drivers and some already have this.
> There are two issues with this though, one is that the firmware cache works 
> with
> devres and devres requires a device already present. Second is firmware names
> can be very dynamic, so one can only know the firmware name later at boot.

I don’t like that idea since we have drivers like btusb.ko that support devices 
from many manufactures. So the only difference is the firmware loading and 
setup. After that they behave all the same. So pre-loading all Bluetooth 
firmwares is pretty crazy.

> Another issues is that most driver developers use the firmware API and without
> knowing *are* creating a firmware cache entry, whereas a few times they don't
> need a firmware cache entry. This is more of performance / optimization thing,
> but something to consider long term as well.
> 
>> We seem to have some patches for doing exactly
>> that, but I have not gotten to review them in detail since they deal with
>> vendor specific complex setup handling.
> 
> Correct me if I'm wrong but some 

Re: btusb "firmware request while host is not available" at resume

2017-09-13 Thread Marcel Holtmann
Hi Luis,

 If something needs to be fixed, can you make a patch showing that?  Or
 do we also need to revert anything else as well to get back to a "better
 working" state?
>>> 
>>> I took a look at the driver and it seems that btusb_setup_intel_new() is
>>> not called after the driver is initialized, rather its called only when
>>> hci_dev_do_open() is called. Its not clear to me how you can end up calling
>>> this on resume but not prior to this on a running system. Feedback from
>>> someone more familiar with bt would be useful.
>>> 
>>> I'd have the call for firmware on probe, no processing would be needed, just
>>> a load to kick the cache into effect would suffice. This may require a bit
>>> of code shift so its best someone more familiar do this.
>>> 
>>> If it confirmed this information is helping avoid these races we can 
>>> reconsider
>>> re-instating the warn as a firmware dev debugging aid for developers.
>>> 
>>> If the race this warning complained about is indeed possible the same race 
>>> is
>>> also possible for other usermode helpers. Its *why* the UMH lock was
>>> implemented, it however was never generalized.
>> 
>> we can not load firmware on probe() in most cases. The btusb.ko driver
>> establishes the HCI transport. It is setup in probe() and then started in
>> hci_dev_do_open() and if there is a vendor setup routine like
>> btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not
>> all, but most) are doing firmware loading over the HCI transport with vendor
>> specific commands.
>> 
>> And yes, if the firmware was already loaded, we would skip requesting it at
>> all. Which means after suspend/resume cycle where the power to the controller
>> is cut, then we are missing the firmware from the cache. Since we never
>> loaded it in the first place.
> 
> I checked and prior to commit 81f95076281f ("firmware: add sanity check on
> shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
> into the umh code") I believe we could end up also failing at a firmware
> request as well. Can you confirm? I see one bug report which confirms this
> since v3.13:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076
> 
> For the sync case we had:
> 
> -   ret = usermodehelper_read_trylock();
> -   if (WARN_ON(ret)) {
> -   dev_err(device, "firmware: %s will not be loaded\n",
> -   name);
> -   goto out;
> -   }
> 
> The warning splat in launchpad bug 1356076 is the above case.
> 
> For the async case we had:
> 
> -   timeout = usermodehelper_read_lock_wait(timeout);
> -   if (!timeout) {
> -   dev_dbg(device, "firmware: %s loading timed out\n",
> -   name);
> -   ret = -EBUSY;
> -   goto out;
> 
> This would be the more silent case.
> 
> If this is accurate than the error case has just been modified to be
> a bit clearer, and without being implicated by the UMH lock. This was
> the design change after all. Nothing more *broken* should have happened.
> 
> In other words if fixing your issues goes away by reverting commit
> 81f95076281f, to be fair you should also try reverting commit 06a45a93e7d34aa
> which did the actual removal of the UMH lock when we don't use the UMH. *That*
> was previously the failure trigger point.
> 
>> So yes, we would have to redo parts of the vendor specific handling to always
>> request the firmware, even if we do not need it right now. And frankly that
>> is not obvious to anybody.
> 
> I agree. I thought a bit about the above and tried co come to a clever 
> resolution,
> for instance one resolution could be to use the MODULE_FIRMWARE() declarations
> of sorts to then let the firmware API do the pre-fetching for you at init. 
> This
> would just require a 1 line change to drivers and some already have this.
> There are two issues with this though, one is that the firmware cache works 
> with
> devres and devres requires a device already present. Second is firmware names
> can be very dynamic, so one can only know the firmware name later at boot.

I don’t like that idea since we have drivers like btusb.ko that support devices 
from many manufactures. So the only difference is the firmware loading and 
setup. After that they behave all the same. So pre-loading all Bluetooth 
firmwares is pretty crazy.

> Another issues is that most driver developers use the firmware API and without
> knowing *are* creating a firmware cache entry, whereas a few times they don't
> need a firmware cache entry. This is more of performance / optimization thing,
> but something to consider long term as well.
> 
>> We seem to have some patches for doing exactly
>> that, but I have not gotten to review them in detail since they deal with
>> vendor specific complex setup handling.
> 
> Correct me if I'm wrong but some 

Re: btusb "firmware request while host is not available" at resume

2017-09-12 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 05:48:52PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 10:06:51PM +0200, Luis R. Rodriguez wrote:
> > On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> > > On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > > > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > > > To confirm, reverting this fixes the problem I was seeing in 4.13.  
> > > > > I've
> > > > > queued it up for the next 4.13-stable release as well.
> > > > 
> > > > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") 
> > > > may
> > > > seem kludgy but the reason for it was to cleanup the horrible forced and
> > > > required UMH lock even when the UMH lock was *not* even needed, which 
> > > > was later
> > > > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into 
> > > > the umh
> > > > code").
> > > 
> > > So what does this mean now that it is reverted?
> > 
> > We discuss what we should do about upkeeping a warning in the future, as
> > I think technically the warning was still valid and it could help avoid
> > racy lookups with the filesystem which otherwise could have gone unnoticed.
> 
> Given that things that used to work before the patch, and then didn't
> work after the patch was added, but then worked again after the patch
> was reverted, I don't think that adding the warning back is good, if it
> still breaks things...

As I noted to Marcel, it would seem this issue was present since before, the
warning either was changed or was more silent before. If reverting commit
81f95076281f "firmware: add sanity check on shutdown/suspend") then one should
consider reverting 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code") as otherwise we end up in a situation where we loose determinism to be
sure of what will happen. From what I gather prior to these two patches we
either got a warning on sync case, or an error on async, after it we get a
warning on either and it should be clearer what the issue should be. I believe
that prior to these two commits the reason why things were failing could have
been pretty obscure. With just commit 81f95076281f reverted we then really
change the logic, and *that* was something I did want to avoid, given I
suspected that kernel was already implicitly depending on the functionality
which the UMH lock had brought upon the non-UMH code path. My suspicions seem
to have been correct.

With just commit 81f95076281f reverted, we may find firmware sometimes, but
we may also just as easily not find firmware some other times. It will depend
on the system.

> > > > Removing the old UMH lock even when the UMH lock was *not* needed was 
> > > > the right
> > > > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > > > shutdown/suspend") was put in place as a safe guard as the lock was also
> > > > placing an implicit sanity check on the API. It ensures the API with 
> > > > the cache
> > > > was used as designed, otherwise you do run the risk of *not getting the
> > > > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > > > 
> > > > It may be possible for us to already have in place safeguards so that 
> > > > upon
> > > > resume we are ensuring the path to the firmware *is* available, so IMHO 
> > > > we
> > > > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > > > check is
> > > > valid. You see, although the UMH lock was bogus, it did implicitly ask 
> > > > the
> > > > question: is it safe for *any* helper to run and make assumptions on the
> > > > filesystem then?
> > > > 
> > > > In lieu of this question being answered the warning is valid given the 
> > > > design
> > > > of the firmware API and the having the cache available as a measure to 
> > > > resolve
> > > > this race.
> > > 
> > > I don't understand what you are trying to say here at all.
> > > 
> > > To be specific, what, if anything, is a problem with the current state
> > > of Linus's tree (and the next 4.13-stable release)?
> > 
> > The warning is issued when drivers issue *new* firmware requests on resume. 
> > The
> > firmware API cache was designed to enable drivers to easily be able to 
> > request
> > firmware on resume without concern about races against the filesystem, but 
> > in order
> > for this to work the drivers must have requested the firmware prior to 
> > suspend.
> 
> Ok, then how is this all working today?

Prior to the above *two* mentioned commits a UMH lock was used, and so similar 
but
different errors should have come up. With just 81f95076281f "firmware: add
sanity check on shutdown/suspend") reverted there is no suspend heuristic
check and we just let the users fend for themselves, it may or may not work
this will very depending on the system.

> I think much more is going on here, sorry, this doesn't make sense.

The UMH lock on the non UMH path was nasty to begin with. We've long depended
on it though, 

Re: btusb "firmware request while host is not available" at resume

2017-09-12 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 05:48:52PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 10:06:51PM +0200, Luis R. Rodriguez wrote:
> > On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> > > On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > > > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > > > To confirm, reverting this fixes the problem I was seeing in 4.13.  
> > > > > I've
> > > > > queued it up for the next 4.13-stable release as well.
> > > > 
> > > > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") 
> > > > may
> > > > seem kludgy but the reason for it was to cleanup the horrible forced and
> > > > required UMH lock even when the UMH lock was *not* even needed, which 
> > > > was later
> > > > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into 
> > > > the umh
> > > > code").
> > > 
> > > So what does this mean now that it is reverted?
> > 
> > We discuss what we should do about upkeeping a warning in the future, as
> > I think technically the warning was still valid and it could help avoid
> > racy lookups with the filesystem which otherwise could have gone unnoticed.
> 
> Given that things that used to work before the patch, and then didn't
> work after the patch was added, but then worked again after the patch
> was reverted, I don't think that adding the warning back is good, if it
> still breaks things...

As I noted to Marcel, it would seem this issue was present since before, the
warning either was changed or was more silent before. If reverting commit
81f95076281f "firmware: add sanity check on shutdown/suspend") then one should
consider reverting 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code") as otherwise we end up in a situation where we loose determinism to be
sure of what will happen. From what I gather prior to these two patches we
either got a warning on sync case, or an error on async, after it we get a
warning on either and it should be clearer what the issue should be. I believe
that prior to these two commits the reason why things were failing could have
been pretty obscure. With just commit 81f95076281f reverted we then really
change the logic, and *that* was something I did want to avoid, given I
suspected that kernel was already implicitly depending on the functionality
which the UMH lock had brought upon the non-UMH code path. My suspicions seem
to have been correct.

With just commit 81f95076281f reverted, we may find firmware sometimes, but
we may also just as easily not find firmware some other times. It will depend
on the system.

> > > > Removing the old UMH lock even when the UMH lock was *not* needed was 
> > > > the right
> > > > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > > > shutdown/suspend") was put in place as a safe guard as the lock was also
> > > > placing an implicit sanity check on the API. It ensures the API with 
> > > > the cache
> > > > was used as designed, otherwise you do run the risk of *not getting the
> > > > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > > > 
> > > > It may be possible for us to already have in place safeguards so that 
> > > > upon
> > > > resume we are ensuring the path to the firmware *is* available, so IMHO 
> > > > we
> > > > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > > > check is
> > > > valid. You see, although the UMH lock was bogus, it did implicitly ask 
> > > > the
> > > > question: is it safe for *any* helper to run and make assumptions on the
> > > > filesystem then?
> > > > 
> > > > In lieu of this question being answered the warning is valid given the 
> > > > design
> > > > of the firmware API and the having the cache available as a measure to 
> > > > resolve
> > > > this race.
> > > 
> > > I don't understand what you are trying to say here at all.
> > > 
> > > To be specific, what, if anything, is a problem with the current state
> > > of Linus's tree (and the next 4.13-stable release)?
> > 
> > The warning is issued when drivers issue *new* firmware requests on resume. 
> > The
> > firmware API cache was designed to enable drivers to easily be able to 
> > request
> > firmware on resume without concern about races against the filesystem, but 
> > in order
> > for this to work the drivers must have requested the firmware prior to 
> > suspend.
> 
> Ok, then how is this all working today?

Prior to the above *two* mentioned commits a UMH lock was used, and so similar 
but
different errors should have come up. With just 81f95076281f "firmware: add
sanity check on shutdown/suspend") reverted there is no suspend heuristic
check and we just let the users fend for themselves, it may or may not work
this will very depending on the system.

> I think much more is going on here, sorry, this doesn't make sense.

The UMH lock on the non UMH path was nasty to begin with. We've long depended
on it though, 

Re: btusb "firmware request while host is not available" at resume

2017-09-12 Thread Luis R. Rodriguez
On Tue, Sep 12, 2017 at 07:13:42AM +0200, Marcel Holtmann wrote:
> >> If something needs to be fixed, can you make a patch showing that?  Or
> >> do we also need to revert anything else as well to get back to a "better
> >> working" state?
> > 
> > I took a look at the driver and it seems that btusb_setup_intel_new() is
> > not called after the driver is initialized, rather its called only when
> > hci_dev_do_open() is called. Its not clear to me how you can end up calling
> > this on resume but not prior to this on a running system. Feedback from
> > someone more familiar with bt would be useful.
> > 
> > I'd have the call for firmware on probe, no processing would be needed, just
> > a load to kick the cache into effect would suffice. This may require a bit
> > of code shift so its best someone more familiar do this.
> > 
> > If it confirmed this information is helping avoid these races we can 
> > reconsider
> > re-instating the warn as a firmware dev debugging aid for developers.
> > 
> > If the race this warning complained about is indeed possible the same race 
> > is
> > also possible for other usermode helpers. Its *why* the UMH lock was
> > implemented, it however was never generalized.
> 
> we can not load firmware on probe() in most cases. The btusb.ko driver
> establishes the HCI transport. It is setup in probe() and then started in
> hci_dev_do_open() and if there is a vendor setup routine like
> btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not
> all, but most) are doing firmware loading over the HCI transport with vendor
> specific commands.
> 
> And yes, if the firmware was already loaded, we would skip requesting it at
> all. Which means after suspend/resume cycle where the power to the controller
> is cut, then we are missing the firmware from the cache. Since we never
> loaded it in the first place.

I checked and prior to commit 81f95076281f ("firmware: add sanity check on
shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
into the umh code") I believe we could end up also failing at a firmware
request as well. Can you confirm? I see one bug report which confirms this
since v3.13:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076

For the sync case we had:

-   ret = usermodehelper_read_trylock();
-   if (WARN_ON(ret)) {
-   dev_err(device, "firmware: %s will not be loaded\n",
-   name);
-   goto out;
-   }

The warning splat in launchpad bug 1356076 is the above case.

For the async case we had:

-   timeout = usermodehelper_read_lock_wait(timeout);
-   if (!timeout) {
-   dev_dbg(device, "firmware: %s loading timed out\n",
-   name);
-   ret = -EBUSY;
-   goto out;

This would be the more silent case.

If this is accurate than the error case has just been modified to be
a bit clearer, and without being implicated by the UMH lock. This was
the design change after all. Nothing more *broken* should have happened.

In other words if fixing your issues goes away by reverting commit
81f95076281f, to be fair you should also try reverting commit 06a45a93e7d34aa
which did the actual removal of the UMH lock when we don't use the UMH. *That*
was previously the failure trigger point.

> So yes, we would have to redo parts of the vendor specific handling to always
> request the firmware, even if we do not need it right now. And frankly that
> is not obvious to anybody.

I agree. I thought a bit about the above and tried co come to a clever 
resolution,
for instance one resolution could be to use the MODULE_FIRMWARE() declarations
of sorts to then let the firmware API do the pre-fetching for you at init. This
would just require a 1 line change to drivers and some already have this.
There are two issues with this though, one is that the firmware cache works with
devres and devres requires a device already present. Second is firmware names
can be very dynamic, so one can only know the firmware name later at boot.

Another issues is that most driver developers use the firmware API and without
knowing *are* creating a firmware cache entry, whereas a few times they don't
need a firmware cache entry. This is more of performance / optimization thing,
but something to consider long term as well.

> We seem to have some patches for doing exactly
> that, but I have not gotten to review them in detail since they deal with
> vendor specific complex setup handling.

Correct me if I'm wrong but some of this work very likely came from the above
old errors, not the new ones?

> Also this affects more than just Intel since all hardware where firmware
> loading is skipped if there is already firmware loaded are affected.

Right, its a core firmware API issue, but this issue has been present for a
while, it was however in different form. 

Re: btusb "firmware request while host is not available" at resume

2017-09-12 Thread Luis R. Rodriguez
On Tue, Sep 12, 2017 at 07:13:42AM +0200, Marcel Holtmann wrote:
> >> If something needs to be fixed, can you make a patch showing that?  Or
> >> do we also need to revert anything else as well to get back to a "better
> >> working" state?
> > 
> > I took a look at the driver and it seems that btusb_setup_intel_new() is
> > not called after the driver is initialized, rather its called only when
> > hci_dev_do_open() is called. Its not clear to me how you can end up calling
> > this on resume but not prior to this on a running system. Feedback from
> > someone more familiar with bt would be useful.
> > 
> > I'd have the call for firmware on probe, no processing would be needed, just
> > a load to kick the cache into effect would suffice. This may require a bit
> > of code shift so its best someone more familiar do this.
> > 
> > If it confirmed this information is helping avoid these races we can 
> > reconsider
> > re-instating the warn as a firmware dev debugging aid for developers.
> > 
> > If the race this warning complained about is indeed possible the same race 
> > is
> > also possible for other usermode helpers. Its *why* the UMH lock was
> > implemented, it however was never generalized.
> 
> we can not load firmware on probe() in most cases. The btusb.ko driver
> establishes the HCI transport. It is setup in probe() and then started in
> hci_dev_do_open() and if there is a vendor setup routine like
> btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not
> all, but most) are doing firmware loading over the HCI transport with vendor
> specific commands.
> 
> And yes, if the firmware was already loaded, we would skip requesting it at
> all. Which means after suspend/resume cycle where the power to the controller
> is cut, then we are missing the firmware from the cache. Since we never
> loaded it in the first place.

I checked and prior to commit 81f95076281f ("firmware: add sanity check on
shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
into the umh code") I believe we could end up also failing at a firmware
request as well. Can you confirm? I see one bug report which confirms this
since v3.13:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076

For the sync case we had:

-   ret = usermodehelper_read_trylock();
-   if (WARN_ON(ret)) {
-   dev_err(device, "firmware: %s will not be loaded\n",
-   name);
-   goto out;
-   }

The warning splat in launchpad bug 1356076 is the above case.

For the async case we had:

-   timeout = usermodehelper_read_lock_wait(timeout);
-   if (!timeout) {
-   dev_dbg(device, "firmware: %s loading timed out\n",
-   name);
-   ret = -EBUSY;
-   goto out;

This would be the more silent case.

If this is accurate than the error case has just been modified to be
a bit clearer, and without being implicated by the UMH lock. This was
the design change after all. Nothing more *broken* should have happened.

In other words if fixing your issues goes away by reverting commit
81f95076281f, to be fair you should also try reverting commit 06a45a93e7d34aa
which did the actual removal of the UMH lock when we don't use the UMH. *That*
was previously the failure trigger point.

> So yes, we would have to redo parts of the vendor specific handling to always
> request the firmware, even if we do not need it right now. And frankly that
> is not obvious to anybody.

I agree. I thought a bit about the above and tried co come to a clever 
resolution,
for instance one resolution could be to use the MODULE_FIRMWARE() declarations
of sorts to then let the firmware API do the pre-fetching for you at init. This
would just require a 1 line change to drivers and some already have this.
There are two issues with this though, one is that the firmware cache works with
devres and devres requires a device already present. Second is firmware names
can be very dynamic, so one can only know the firmware name later at boot.

Another issues is that most driver developers use the firmware API and without
knowing *are* creating a firmware cache entry, whereas a few times they don't
need a firmware cache entry. This is more of performance / optimization thing,
but something to consider long term as well.

> We seem to have some patches for doing exactly
> that, but I have not gotten to review them in detail since they deal with
> vendor specific complex setup handling.

Correct me if I'm wrong but some of this work very likely came from the above
old errors, not the new ones?

> Also this affects more than just Intel since all hardware where firmware
> loading is skipped if there is already firmware loaded are affected.

Right, its a core firmware API issue, but this issue has been present for a
while, it was however in different form. 

Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Marcel Holtmann
Hi Luis,

 To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
 queued it up for the next 4.13-stable release as well.
>>> 
>>> Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
>>> seem kludgy but the reason for it was to cleanup the horrible forced and
>>> required UMH lock even when the UMH lock was *not* even needed, which was 
>>> later
>>> removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the 
>>> umh
>>> code").
>> 
>> So what does this mean now that it is reverted?
> 
> We discuss what we should do about upkeeping a warning in the future, as
> I think technically the warning was still valid and it could help avoid
> racy lookups with the filesystem which otherwise could have gone unnoticed.
> 
>>> Removing the old UMH lock even when the UMH lock was *not* needed was the 
>>> right
>>> thing to do but commit 81f95076281f (("firmware: add sanity check on
>>> shutdown/suspend") was put in place as a safe guard as the lock was also
>>> placing an implicit sanity check on the API. It ensures the API with the 
>>> cache
>>> was used as designed, otherwise you do run the risk of *not getting the
>>> firmware you may need* -- Marcel seems to acknowledge this possibility.
>>> 
>>> It may be possible for us to already have in place safeguards so that upon
>>> resume we are ensuring the path to the firmware *is* available, so IMHO we
>>> should remove this *iff* we can provide this guarantee.  Otherwise the 
>>> check is
>>> valid. You see, although the UMH lock was bogus, it did implicitly ask the
>>> question: is it safe for *any* helper to run and make assumptions on the
>>> filesystem then?
>>> 
>>> In lieu of this question being answered the warning is valid given the 
>>> design
>>> of the firmware API and the having the cache available as a measure to 
>>> resolve
>>> this race.
>> 
>> I don't understand what you are trying to say here at all.
>> 
>> To be specific, what, if anything, is a problem with the current state
>> of Linus's tree (and the next 4.13-stable release)?
> 
> The warning is issued when drivers issue *new* firmware requests on resume. 
> The
> firmware API cache was designed to enable drivers to easily be able to request
> firmware on resume without concern about races against the filesystem, but in 
> order
> for this to work the drivers must have requested the firmware prior to 
> suspend.
> 
>> If something needs to be fixed, can you make a patch showing that?  Or
>> do we also need to revert anything else as well to get back to a "better
>> working" state?
> 
> I took a look at the driver and it seems that btusb_setup_intel_new() is
> not called after the driver is initialized, rather its called only when
> hci_dev_do_open() is called. Its not clear to me how you can end up calling
> this on resume but not prior to this on a running system. Feedback from
> someone more familiar with bt would be useful.
> 
> I'd have the call for firmware on probe, no processing would be needed, just
> a load to kick the cache into effect would suffice. This may require a bit
> of code shift so its best someone more familiar do this.
> 
> If it confirmed this information is helping avoid these races we can 
> reconsider
> re-instating the warn as a firmware dev debugging aid for developers.
> 
> If the race this warning complained about is indeed possible the same race is
> also possible for other usermode helpers. Its *why* the UMH lock was
> implemented, it however was never generalized.

we can not load firmware on probe() in most cases. The btusb.ko driver 
establishes the HCI transport. It is setup in probe() and then started in 
hci_dev_do_open() and if there is a vendor setup routine like 
btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not 
all, but most) are doing firmware loading over the HCI transport with vendor 
specific commands.

And yes, if the firmware was already loaded, we would skip requesting it at 
all. Which means after suspend/resume cycle where the power to the controller 
is cut, then we are missing the firmware from the cache. Since we never loaded 
it in the first place.

So yes, we would have to redo parts of the vendor specific handling to always 
request the firmware, even if we do not need it right now. And frankly that is 
not obvious to anybody. We seem to have some patches for doing exactly that, 
but I have not gotten to review them in detail since they deal with vendor 
specific complex setup handling. Also this affects more than just Intel since 
all hardware where firmware loading is skipped if there is already firmware 
loaded are affected.

Regards

Marcel



Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Marcel Holtmann
Hi Luis,

 To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
 queued it up for the next 4.13-stable release as well.
>>> 
>>> Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
>>> seem kludgy but the reason for it was to cleanup the horrible forced and
>>> required UMH lock even when the UMH lock was *not* even needed, which was 
>>> later
>>> removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the 
>>> umh
>>> code").
>> 
>> So what does this mean now that it is reverted?
> 
> We discuss what we should do about upkeeping a warning in the future, as
> I think technically the warning was still valid and it could help avoid
> racy lookups with the filesystem which otherwise could have gone unnoticed.
> 
>>> Removing the old UMH lock even when the UMH lock was *not* needed was the 
>>> right
>>> thing to do but commit 81f95076281f (("firmware: add sanity check on
>>> shutdown/suspend") was put in place as a safe guard as the lock was also
>>> placing an implicit sanity check on the API. It ensures the API with the 
>>> cache
>>> was used as designed, otherwise you do run the risk of *not getting the
>>> firmware you may need* -- Marcel seems to acknowledge this possibility.
>>> 
>>> It may be possible for us to already have in place safeguards so that upon
>>> resume we are ensuring the path to the firmware *is* available, so IMHO we
>>> should remove this *iff* we can provide this guarantee.  Otherwise the 
>>> check is
>>> valid. You see, although the UMH lock was bogus, it did implicitly ask the
>>> question: is it safe for *any* helper to run and make assumptions on the
>>> filesystem then?
>>> 
>>> In lieu of this question being answered the warning is valid given the 
>>> design
>>> of the firmware API and the having the cache available as a measure to 
>>> resolve
>>> this race.
>> 
>> I don't understand what you are trying to say here at all.
>> 
>> To be specific, what, if anything, is a problem with the current state
>> of Linus's tree (and the next 4.13-stable release)?
> 
> The warning is issued when drivers issue *new* firmware requests on resume. 
> The
> firmware API cache was designed to enable drivers to easily be able to request
> firmware on resume without concern about races against the filesystem, but in 
> order
> for this to work the drivers must have requested the firmware prior to 
> suspend.
> 
>> If something needs to be fixed, can you make a patch showing that?  Or
>> do we also need to revert anything else as well to get back to a "better
>> working" state?
> 
> I took a look at the driver and it seems that btusb_setup_intel_new() is
> not called after the driver is initialized, rather its called only when
> hci_dev_do_open() is called. Its not clear to me how you can end up calling
> this on resume but not prior to this on a running system. Feedback from
> someone more familiar with bt would be useful.
> 
> I'd have the call for firmware on probe, no processing would be needed, just
> a load to kick the cache into effect would suffice. This may require a bit
> of code shift so its best someone more familiar do this.
> 
> If it confirmed this information is helping avoid these races we can 
> reconsider
> re-instating the warn as a firmware dev debugging aid for developers.
> 
> If the race this warning complained about is indeed possible the same race is
> also possible for other usermode helpers. Its *why* the UMH lock was
> implemented, it however was never generalized.

we can not load firmware on probe() in most cases. The btusb.ko driver 
establishes the HCI transport. It is setup in probe() and then started in 
hci_dev_do_open() and if there is a vendor setup routine like 
btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not 
all, but most) are doing firmware loading over the HCI transport with vendor 
specific commands.

And yes, if the firmware was already loaded, we would skip requesting it at 
all. Which means after suspend/resume cycle where the power to the controller 
is cut, then we are missing the firmware from the cache. Since we never loaded 
it in the first place.

So yes, we would have to redo parts of the vendor specific handling to always 
request the firmware, even if we do not need it right now. And frankly that is 
not obvious to anybody. We seem to have some patches for doing exactly that, 
but I have not gotten to review them in detail since they deal with vendor 
specific complex setup handling. Also this affects more than just Intel since 
all hardware where firmware loading is skipped if there is already firmware 
loaded are affected.

Regards

Marcel



Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 10:06:51PM +0200, Luis R. Rodriguez wrote:
> On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> > On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > > > queued it up for the next 4.13-stable release as well.
> > > 
> > > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> > > seem kludgy but the reason for it was to cleanup the horrible forced and
> > > required UMH lock even when the UMH lock was *not* even needed, which was 
> > > later
> > > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into 
> > > the umh
> > > code").
> > 
> > So what does this mean now that it is reverted?
> 
> We discuss what we should do about upkeeping a warning in the future, as
> I think technically the warning was still valid and it could help avoid
> racy lookups with the filesystem which otherwise could have gone unnoticed.

Given that things that used to work before the patch, and then didn't
work after the patch was added, but then worked again after the patch
was reverted, I don't think that adding the warning back is good, if it
still breaks things...

> > > Removing the old UMH lock even when the UMH lock was *not* needed was the 
> > > right
> > > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > > shutdown/suspend") was put in place as a safe guard as the lock was also
> > > placing an implicit sanity check on the API. It ensures the API with the 
> > > cache
> > > was used as designed, otherwise you do run the risk of *not getting the
> > > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > > 
> > > It may be possible for us to already have in place safeguards so that upon
> > > resume we are ensuring the path to the firmware *is* available, so IMHO we
> > > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > > check is
> > > valid. You see, although the UMH lock was bogus, it did implicitly ask the
> > > question: is it safe for *any* helper to run and make assumptions on the
> > > filesystem then?
> > > 
> > > In lieu of this question being answered the warning is valid given the 
> > > design
> > > of the firmware API and the having the cache available as a measure to 
> > > resolve
> > > this race.
> > 
> > I don't understand what you are trying to say here at all.
> > 
> > To be specific, what, if anything, is a problem with the current state
> > of Linus's tree (and the next 4.13-stable release)?
> 
> The warning is issued when drivers issue *new* firmware requests on resume. 
> The
> firmware API cache was designed to enable drivers to easily be able to request
> firmware on resume without concern about races against the filesystem, but in 
> order
> for this to work the drivers must have requested the firmware prior to 
> suspend.

Ok, then how is this all working today?

I think much more is going on here, sorry, this doesn't make sense.

> > If something needs to be fixed, can you make a patch showing that?  Or
> > do we also need to revert anything else as well to get back to a "better
> > working" state?
> 
> I took a look at the driver and it seems that btusb_setup_intel_new() is
> not called after the driver is initialized, rather its called only when
> hci_dev_do_open() is called. Its not clear to me how you can end up calling
> this on resume but not prior to this on a running system. Feedback from
> someone more familiar with bt would be useful.

open is a great place to want to load firmware, that makes sense, you
get to defer loading stuff until you actually need it.  I see nothing
wrong here (again, especially as it _does_ work...)

> If it confirmed this information is helping avoid these races we can 
> reconsider
> re-instating the warn as a firmware dev debugging aid for developers.

Again, obviously this was not just a "warning" message, you changed the
logic here.

Anyway, all is good now, I guess we don't have to worry about it :)

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 10:06:51PM +0200, Luis R. Rodriguez wrote:
> On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> > On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > > > queued it up for the next 4.13-stable release as well.
> > > 
> > > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> > > seem kludgy but the reason for it was to cleanup the horrible forced and
> > > required UMH lock even when the UMH lock was *not* even needed, which was 
> > > later
> > > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into 
> > > the umh
> > > code").
> > 
> > So what does this mean now that it is reverted?
> 
> We discuss what we should do about upkeeping a warning in the future, as
> I think technically the warning was still valid and it could help avoid
> racy lookups with the filesystem which otherwise could have gone unnoticed.

Given that things that used to work before the patch, and then didn't
work after the patch was added, but then worked again after the patch
was reverted, I don't think that adding the warning back is good, if it
still breaks things...

> > > Removing the old UMH lock even when the UMH lock was *not* needed was the 
> > > right
> > > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > > shutdown/suspend") was put in place as a safe guard as the lock was also
> > > placing an implicit sanity check on the API. It ensures the API with the 
> > > cache
> > > was used as designed, otherwise you do run the risk of *not getting the
> > > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > > 
> > > It may be possible for us to already have in place safeguards so that upon
> > > resume we are ensuring the path to the firmware *is* available, so IMHO we
> > > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > > check is
> > > valid. You see, although the UMH lock was bogus, it did implicitly ask the
> > > question: is it safe for *any* helper to run and make assumptions on the
> > > filesystem then?
> > > 
> > > In lieu of this question being answered the warning is valid given the 
> > > design
> > > of the firmware API and the having the cache available as a measure to 
> > > resolve
> > > this race.
> > 
> > I don't understand what you are trying to say here at all.
> > 
> > To be specific, what, if anything, is a problem with the current state
> > of Linus's tree (and the next 4.13-stable release)?
> 
> The warning is issued when drivers issue *new* firmware requests on resume. 
> The
> firmware API cache was designed to enable drivers to easily be able to request
> firmware on resume without concern about races against the filesystem, but in 
> order
> for this to work the drivers must have requested the firmware prior to 
> suspend.

Ok, then how is this all working today?

I think much more is going on here, sorry, this doesn't make sense.

> > If something needs to be fixed, can you make a patch showing that?  Or
> > do we also need to revert anything else as well to get back to a "better
> > working" state?
> 
> I took a look at the driver and it seems that btusb_setup_intel_new() is
> not called after the driver is initialized, rather its called only when
> hci_dev_do_open() is called. Its not clear to me how you can end up calling
> this on resume but not prior to this on a running system. Feedback from
> someone more familiar with bt would be useful.

open is a great place to want to load firmware, that makes sense, you
get to defer loading stuff until you actually need it.  I see nothing
wrong here (again, especially as it _does_ work...)

> If it confirmed this information is helping avoid these races we can 
> reconsider
> re-instating the warn as a firmware dev debugging aid for developers.

Again, obviously this was not just a "warning" message, you changed the
logic here.

Anyway, all is good now, I guess we don't have to worry about it :)

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 5:13 PM, Gabriel C  wrote:
> On 11.09.2017 22:06, Luis R. Rodriguez wrote:
>>
>> On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
>>>
>>> On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:

 On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
>
> To confirm, reverting this fixes the problem I was seeing in 4.13.
> I've
> queued it up for the next 4.13-stable release as well.


 Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend")
 may
 seem kludgy but the reason for it was to cleanup the horrible forced and
 required UMH lock even when the UMH lock was *not* even needed, which
 was later
 removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into
 the umh
 code").
>>>
>>>
>>> So what does this mean now that it is reverted?
>>
>>
>> We discuss what we should do about upkeeping a warning in the future, as
>> I think technically the warning was still valid and it could help avoid
>> racy lookups with the filesystem which otherwise could have gone
>> unnoticed.
>>
 Removing the old UMH lock even when the UMH lock was *not* needed was
 the right
 thing to do but commit 81f95076281f (("firmware: add sanity check on
 shutdown/suspend") was put in place as a safe guard as the lock was also
 placing an implicit sanity check on the API. It ensures the API with the
 cache
 was used as designed, otherwise you do run the risk of *not getting the
 firmware you may need* -- Marcel seems to acknowledge this possibility.

 It may be possible for us to already have in place safeguards so that
 upon
 resume we are ensuring the path to the firmware *is* available, so IMHO
 we
 should remove this *iff* we can provide this guarantee.  Otherwise the
 check is
 valid. You see, although the UMH lock was bogus, it did implicitly ask
 the
 question: is it safe for *any* helper to run and make assumptions on the
 filesystem then?

 In lieu of this question being answered the warning is valid given the
 design
 of the firmware API and the having the cache available as a measure to
 resolve
 this race.
>>>
>>>
>>> I don't understand what you are trying to say here at all.
>>>
>>> To be specific, what, if anything, is a problem with the current state
>>> of Linus's tree (and the next 4.13-stable release)?
>>
>>
>> The warning is issued when drivers issue *new* firmware requests on
>> resume. The
>> firmware API cache was designed to enable drivers to easily be able to
>> request
>> firmware on resume without concern about races against the filesystem, but
>> in order
>> for this to work the drivers must have requested the firmware prior to
>> suspend.
>>
>>> If something needs to be fixed, can you make a patch showing that?  Or
>>> do we also need to revert anything else as well to get back to a "better
>>> working" state?
>>
>>
>> I took a look at the driver and it seems that btusb_setup_intel_new() is
>> not called after the driver is initialized, rather its called only when
>> hci_dev_do_open() is called. Its not clear to me how you can end up
>> calling
>> this on resume but not prior to this on a running system. Feedback from
>> someone more familiar with bt would be useful.
>
>
> While I really don't know that code I can tell anything about .. however
> this is _NOT_
> about Intel Hardware only .. I trigger that on laptops with Atheros..

If the Atheros driver works in the same way then the same race exists
there as well.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 5:13 PM, Gabriel C  wrote:
> On 11.09.2017 22:06, Luis R. Rodriguez wrote:
>>
>> On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
>>>
>>> On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:

 On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
>
> To confirm, reverting this fixes the problem I was seeing in 4.13.
> I've
> queued it up for the next 4.13-stable release as well.


 Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend")
 may
 seem kludgy but the reason for it was to cleanup the horrible forced and
 required UMH lock even when the UMH lock was *not* even needed, which
 was later
 removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into
 the umh
 code").
>>>
>>>
>>> So what does this mean now that it is reverted?
>>
>>
>> We discuss what we should do about upkeeping a warning in the future, as
>> I think technically the warning was still valid and it could help avoid
>> racy lookups with the filesystem which otherwise could have gone
>> unnoticed.
>>
 Removing the old UMH lock even when the UMH lock was *not* needed was
 the right
 thing to do but commit 81f95076281f (("firmware: add sanity check on
 shutdown/suspend") was put in place as a safe guard as the lock was also
 placing an implicit sanity check on the API. It ensures the API with the
 cache
 was used as designed, otherwise you do run the risk of *not getting the
 firmware you may need* -- Marcel seems to acknowledge this possibility.

 It may be possible for us to already have in place safeguards so that
 upon
 resume we are ensuring the path to the firmware *is* available, so IMHO
 we
 should remove this *iff* we can provide this guarantee.  Otherwise the
 check is
 valid. You see, although the UMH lock was bogus, it did implicitly ask
 the
 question: is it safe for *any* helper to run and make assumptions on the
 filesystem then?

 In lieu of this question being answered the warning is valid given the
 design
 of the firmware API and the having the cache available as a measure to
 resolve
 this race.
>>>
>>>
>>> I don't understand what you are trying to say here at all.
>>>
>>> To be specific, what, if anything, is a problem with the current state
>>> of Linus's tree (and the next 4.13-stable release)?
>>
>>
>> The warning is issued when drivers issue *new* firmware requests on
>> resume. The
>> firmware API cache was designed to enable drivers to easily be able to
>> request
>> firmware on resume without concern about races against the filesystem, but
>> in order
>> for this to work the drivers must have requested the firmware prior to
>> suspend.
>>
>>> If something needs to be fixed, can you make a patch showing that?  Or
>>> do we also need to revert anything else as well to get back to a "better
>>> working" state?
>>
>>
>> I took a look at the driver and it seems that btusb_setup_intel_new() is
>> not called after the driver is initialized, rather its called only when
>> hci_dev_do_open() is called. Its not clear to me how you can end up
>> calling
>> this on resume but not prior to this on a running system. Feedback from
>> someone more familiar with bt would be useful.
>
>
> While I really don't know that code I can tell anything about .. however
> this is _NOT_
> about Intel Hardware only .. I trigger that on laptops with Atheros..

If the Atheros driver works in the same way then the same race exists
there as well.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Gabriel C

On 11.09.2017 22:06, Luis R. Rodriguez wrote:

On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:

On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:

On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:

To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
queued it up for the next 4.13-stable release as well.


Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
seem kludgy but the reason for it was to cleanup the horrible forced and
required UMH lock even when the UMH lock was *not* even needed, which was later
removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code").


So what does this mean now that it is reverted?


We discuss what we should do about upkeeping a warning in the future, as
I think technically the warning was still valid and it could help avoid
racy lookups with the filesystem which otherwise could have gone unnoticed.


Removing the old UMH lock even when the UMH lock was *not* needed was the right
thing to do but commit 81f95076281f (("firmware: add sanity check on
shutdown/suspend") was put in place as a safe guard as the lock was also
placing an implicit sanity check on the API. It ensures the API with the cache
was used as designed, otherwise you do run the risk of *not getting the
firmware you may need* -- Marcel seems to acknowledge this possibility.

It may be possible for us to already have in place safeguards so that upon
resume we are ensuring the path to the firmware *is* available, so IMHO we
should remove this *iff* we can provide this guarantee.  Otherwise the check is
valid. You see, although the UMH lock was bogus, it did implicitly ask the
question: is it safe for *any* helper to run and make assumptions on the
filesystem then?

In lieu of this question being answered the warning is valid given the design
of the firmware API and the having the cache available as a measure to resolve
this race.


I don't understand what you are trying to say here at all.

To be specific, what, if anything, is a problem with the current state
of Linus's tree (and the next 4.13-stable release)?


The warning is issued when drivers issue *new* firmware requests on resume. The
firmware API cache was designed to enable drivers to easily be able to request
firmware on resume without concern about races against the filesystem, but in 
order
for this to work the drivers must have requested the firmware prior to suspend.


If something needs to be fixed, can you make a patch showing that?  Or
do we also need to revert anything else as well to get back to a "better
working" state?


I took a look at the driver and it seems that btusb_setup_intel_new() is
not called after the driver is initialized, rather its called only when
hci_dev_do_open() is called. Its not clear to me how you can end up calling
this on resume but not prior to this on a running system. Feedback from
someone more familiar with bt would be useful.


While I really don't know that code I can tell anything about .. however this 
is _NOT_
about Intel Hardware only .. I trigger that on laptops with Atheros..

So seems like some driver need be fixed before trying something like this again?

Regards,

Gabriel C


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Gabriel C

On 11.09.2017 22:06, Luis R. Rodriguez wrote:

On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:

On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:

On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:

To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
queued it up for the next 4.13-stable release as well.


Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
seem kludgy but the reason for it was to cleanup the horrible forced and
required UMH lock even when the UMH lock was *not* even needed, which was later
removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code").


So what does this mean now that it is reverted?


We discuss what we should do about upkeeping a warning in the future, as
I think technically the warning was still valid and it could help avoid
racy lookups with the filesystem which otherwise could have gone unnoticed.


Removing the old UMH lock even when the UMH lock was *not* needed was the right
thing to do but commit 81f95076281f (("firmware: add sanity check on
shutdown/suspend") was put in place as a safe guard as the lock was also
placing an implicit sanity check on the API. It ensures the API with the cache
was used as designed, otherwise you do run the risk of *not getting the
firmware you may need* -- Marcel seems to acknowledge this possibility.

It may be possible for us to already have in place safeguards so that upon
resume we are ensuring the path to the firmware *is* available, so IMHO we
should remove this *iff* we can provide this guarantee.  Otherwise the check is
valid. You see, although the UMH lock was bogus, it did implicitly ask the
question: is it safe for *any* helper to run and make assumptions on the
filesystem then?

In lieu of this question being answered the warning is valid given the design
of the firmware API and the having the cache available as a measure to resolve
this race.


I don't understand what you are trying to say here at all.

To be specific, what, if anything, is a problem with the current state
of Linus's tree (and the next 4.13-stable release)?


The warning is issued when drivers issue *new* firmware requests on resume. The
firmware API cache was designed to enable drivers to easily be able to request
firmware on resume without concern about races against the filesystem, but in 
order
for this to work the drivers must have requested the firmware prior to suspend.


If something needs to be fixed, can you make a patch showing that?  Or
do we also need to revert anything else as well to get back to a "better
working" state?


I took a look at the driver and it seems that btusb_setup_intel_new() is
not called after the driver is initialized, rather its called only when
hci_dev_do_open() is called. Its not clear to me how you can end up calling
this on resume but not prior to this on a running system. Feedback from
someone more familiar with bt would be useful.


While I really don't know that code I can tell anything about .. however this 
is _NOT_
about Intel Hardware only .. I trigger that on laptops with Atheros..

So seems like some driver need be fixed before trying something like this again?

Regards,

Gabriel C


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > > queued it up for the next 4.13-stable release as well.
> > 
> > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> > seem kludgy but the reason for it was to cleanup the horrible forced and
> > required UMH lock even when the UMH lock was *not* even needed, which was 
> > later
> > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the 
> > umh
> > code").
> 
> So what does this mean now that it is reverted?

We discuss what we should do about upkeeping a warning in the future, as
I think technically the warning was still valid and it could help avoid
racy lookups with the filesystem which otherwise could have gone unnoticed.

> > Removing the old UMH lock even when the UMH lock was *not* needed was the 
> > right
> > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > shutdown/suspend") was put in place as a safe guard as the lock was also
> > placing an implicit sanity check on the API. It ensures the API with the 
> > cache
> > was used as designed, otherwise you do run the risk of *not getting the
> > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > 
> > It may be possible for us to already have in place safeguards so that upon
> > resume we are ensuring the path to the firmware *is* available, so IMHO we
> > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > check is
> > valid. You see, although the UMH lock was bogus, it did implicitly ask the
> > question: is it safe for *any* helper to run and make assumptions on the
> > filesystem then?
> > 
> > In lieu of this question being answered the warning is valid given the 
> > design
> > of the firmware API and the having the cache available as a measure to 
> > resolve
> > this race.
> 
> I don't understand what you are trying to say here at all.
> 
> To be specific, what, if anything, is a problem with the current state
> of Linus's tree (and the next 4.13-stable release)?

The warning is issued when drivers issue *new* firmware requests on resume. The
firmware API cache was designed to enable drivers to easily be able to request
firmware on resume without concern about races against the filesystem, but in 
order
for this to work the drivers must have requested the firmware prior to suspend.

> If something needs to be fixed, can you make a patch showing that?  Or
> do we also need to revert anything else as well to get back to a "better
> working" state?

I took a look at the driver and it seems that btusb_setup_intel_new() is
not called after the driver is initialized, rather its called only when
hci_dev_do_open() is called. Its not clear to me how you can end up calling
this on resume but not prior to this on a running system. Feedback from
someone more familiar with bt would be useful.

I'd have the call for firmware on probe, no processing would be needed, just
a load to kick the cache into effect would suffice. This may require a bit
of code shift so its best someone more familiar do this.

If it confirmed this information is helping avoid these races we can reconsider
re-instating the warn as a firmware dev debugging aid for developers.

If the race this warning complained about is indeed possible the same race is
also possible for other usermode helpers. Its *why* the UMH lock was
implemented, it however was never generalized.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 12:29:55PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> > On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > > queued it up for the next 4.13-stable release as well.
> > 
> > Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> > seem kludgy but the reason for it was to cleanup the horrible forced and
> > required UMH lock even when the UMH lock was *not* even needed, which was 
> > later
> > removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the 
> > umh
> > code").
> 
> So what does this mean now that it is reverted?

We discuss what we should do about upkeeping a warning in the future, as
I think technically the warning was still valid and it could help avoid
racy lookups with the filesystem which otherwise could have gone unnoticed.

> > Removing the old UMH lock even when the UMH lock was *not* needed was the 
> > right
> > thing to do but commit 81f95076281f (("firmware: add sanity check on
> > shutdown/suspend") was put in place as a safe guard as the lock was also
> > placing an implicit sanity check on the API. It ensures the API with the 
> > cache
> > was used as designed, otherwise you do run the risk of *not getting the
> > firmware you may need* -- Marcel seems to acknowledge this possibility.
> > 
> > It may be possible for us to already have in place safeguards so that upon
> > resume we are ensuring the path to the firmware *is* available, so IMHO we
> > should remove this *iff* we can provide this guarantee.  Otherwise the 
> > check is
> > valid. You see, although the UMH lock was bogus, it did implicitly ask the
> > question: is it safe for *any* helper to run and make assumptions on the
> > filesystem then?
> > 
> > In lieu of this question being answered the warning is valid given the 
> > design
> > of the firmware API and the having the cache available as a measure to 
> > resolve
> > this race.
> 
> I don't understand what you are trying to say here at all.
> 
> To be specific, what, if anything, is a problem with the current state
> of Linus's tree (and the next 4.13-stable release)?

The warning is issued when drivers issue *new* firmware requests on resume. The
firmware API cache was designed to enable drivers to easily be able to request
firmware on resume without concern about races against the filesystem, but in 
order
for this to work the drivers must have requested the firmware prior to suspend.

> If something needs to be fixed, can you make a patch showing that?  Or
> do we also need to revert anything else as well to get back to a "better
> working" state?

I took a look at the driver and it seems that btusb_setup_intel_new() is
not called after the driver is initialized, rather its called only when
hci_dev_do_open() is called. Its not clear to me how you can end up calling
this on resume but not prior to this on a running system. Feedback from
someone more familiar with bt would be useful.

I'd have the call for firmware on probe, no processing would be needed, just
a load to kick the cache into effect would suffice. This may require a bit
of code shift so its best someone more familiar do this.

If it confirmed this information is helping avoid these races we can reconsider
re-instating the warn as a firmware dev debugging aid for developers.

If the race this warning complained about is indeed possible the same race is
also possible for other usermode helpers. Its *why* the UMH lock was
implemented, it however was never generalized.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > queued it up for the next 4.13-stable release as well.
> 
> Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> seem kludgy but the reason for it was to cleanup the horrible forced and
> required UMH lock even when the UMH lock was *not* even needed, which was 
> later
> removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
> code").

So what does this mean now that it is reverted?

> Removing the old UMH lock even when the UMH lock was *not* needed was the 
> right
> thing to do but commit 81f95076281f (("firmware: add sanity check on
> shutdown/suspend") was put in place as a safe guard as the lock was also
> placing an implicit sanity check on the API. It ensures the API with the cache
> was used as designed, otherwise you do run the risk of *not getting the
> firmware you may need* -- Marcel seems to acknowledge this possibility.
> 
> It may be possible for us to already have in place safeguards so that upon
> resume we are ensuring the path to the firmware *is* available, so IMHO we
> should remove this *iff* we can provide this guarantee.  Otherwise the check 
> is
> valid. You see, although the UMH lock was bogus, it did implicitly ask the
> question: is it safe for *any* helper to run and make assumptions on the
> filesystem then?
> 
> In lieu of this question being answered the warning is valid given the design
> of the firmware API and the having the cache available as a measure to resolve
> this race.

I don't understand what you are trying to say here at all.

To be specific, what, if anything, is a problem with the current state
of Linus's tree (and the next 4.13-stable release)?

If something needs to be fixed, can you make a patch showing that?  Or
do we also need to revert anything else as well to get back to a "better
working" state?

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 07:11:38PM +0200, Luis R. Rodriguez wrote:
> On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> > To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> > queued it up for the next 4.13-stable release as well.
> 
> Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
> seem kludgy but the reason for it was to cleanup the horrible forced and
> required UMH lock even when the UMH lock was *not* even needed, which was 
> later
> removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
> code").

So what does this mean now that it is reverted?

> Removing the old UMH lock even when the UMH lock was *not* needed was the 
> right
> thing to do but commit 81f95076281f (("firmware: add sanity check on
> shutdown/suspend") was put in place as a safe guard as the lock was also
> placing an implicit sanity check on the API. It ensures the API with the cache
> was used as designed, otherwise you do run the risk of *not getting the
> firmware you may need* -- Marcel seems to acknowledge this possibility.
> 
> It may be possible for us to already have in place safeguards so that upon
> resume we are ensuring the path to the firmware *is* available, so IMHO we
> should remove this *iff* we can provide this guarantee.  Otherwise the check 
> is
> valid. You see, although the UMH lock was bogus, it did implicitly ask the
> question: is it safe for *any* helper to run and make assumptions on the
> filesystem then?
> 
> In lieu of this question being answered the warning is valid given the design
> of the firmware API and the having the cache available as a measure to resolve
> this race.

I don't understand what you are trying to say here at all.

To be specific, what, if anything, is a problem with the current state
of Linus's tree (and the next 4.13-stable release)?

If something needs to be fixed, can you make a patch showing that?  Or
do we also need to revert anything else as well to get back to a "better
working" state?

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 07:12:44AM +0200, Marcel Holtmann wrote:
> > Hi Linus,
> > 
> > >> Yes 81f95076281f is to blame.. After reverting it all is fine again.
> > >> 
> > >> 15 resume cycles on the one laptop , 10 on the other without to hit the
> > >> trace.
> > > 
> > > Yeah, I think that disable/enable_firmware in the suspend/resume path
> > > is basically just completely random code. There  is nothing that
> > > serializes with anything else, and it has no actual basis for saying
> > > "I am now disabled/enabled" except for some entirely random time of
> > > whenever the suspend/resume callbacks happen.
> > > 
> > > I'm going to revert it.
> > > 
> > > I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> > > just didn't notice, because I didn't use bluetooth and I wasn't
> > > traveling. I test my laptop every release, but I don't necessarily
> > > always _use_ it.
> > 
> > we have a bug report that might go into the similar direction. So the
> > problem is that modern Bluetooth controller require full firmware
> > loading (not just ROM patching) and if the controller has the firmware
> > somehow already loaded, but then looses power and needs a reload, then
> > it is not cached (since it was never requested in the first place).

You mean that on resume there may be a requirement to load a different file
than on boot? If so you can just request it proactively at boot.  Note that the
cache is associated for the device with devres, and even if you call
release_firmware() on the firmware it will be cached for you.  The firmware
cache works with the devres entry, and the firmware devres entry is maintained
throughout the lifetime of the device. This means that even if you
release_firmware() the firmware cache will still be used on resume from
suspend.

> > Of course if the reload triggers during resume when not file system is
> > available, it can not request the firmware. That will just fail and
> > thus you might see this issue.

One of the designs of the firmware cache is to circumvent these concerns,
so you just want to request what you need for suspend early on in the
boot cycle, the firmware API will do what it needs for you.

Unfortunately a lot of this was just not well documented, I hoped that
recent love in this are would have helped.

> > We have a few RFC patches on the
> > mailing list that I have to review. It is however not that easy all
> > the time to find the right firmware file (at least not for Intel
> > hardware) since the boot loader provides different information than
> > the fully operational firmware. So I need to make sure that request
> > the right firmware (if we do not need it initially) so that it gets
> > cached.
> 
> To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> queued it up for the next 4.13-stable release as well.

Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
seem kludgy but the reason for it was to cleanup the horrible forced and
required UMH lock even when the UMH lock was *not* even needed, which was later
removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code").

Removing the old UMH lock even when the UMH lock was *not* needed was the right
thing to do but commit 81f95076281f (("firmware: add sanity check on
shutdown/suspend") was put in place as a safe guard as the lock was also
placing an implicit sanity check on the API. It ensures the API with the cache
was used as designed, otherwise you do run the risk of *not getting the
firmware you may need* -- Marcel seems to acknowledge this possibility.

It may be possible for us to already have in place safeguards so that upon
resume we are ensuring the path to the firmware *is* available, so IMHO we
should remove this *iff* we can provide this guarantee.  Otherwise the check is
valid. You see, although the UMH lock was bogus, it did implicitly ask the
question: is it safe for *any* helper to run and make assumptions on the
filesystem then?

In lieu of this question being answered the warning is valid given the design
of the firmware API and the having the cache available as a measure to resolve
this race.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Luis R. Rodriguez
On Mon, Sep 11, 2017 at 06:46:47AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Sep 11, 2017 at 07:12:44AM +0200, Marcel Holtmann wrote:
> > Hi Linus,
> > 
> > >> Yes 81f95076281f is to blame.. After reverting it all is fine again.
> > >> 
> > >> 15 resume cycles on the one laptop , 10 on the other without to hit the
> > >> trace.
> > > 
> > > Yeah, I think that disable/enable_firmware in the suspend/resume path
> > > is basically just completely random code. There  is nothing that
> > > serializes with anything else, and it has no actual basis for saying
> > > "I am now disabled/enabled" except for some entirely random time of
> > > whenever the suspend/resume callbacks happen.
> > > 
> > > I'm going to revert it.
> > > 
> > > I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> > > just didn't notice, because I didn't use bluetooth and I wasn't
> > > traveling. I test my laptop every release, but I don't necessarily
> > > always _use_ it.
> > 
> > we have a bug report that might go into the similar direction. So the
> > problem is that modern Bluetooth controller require full firmware
> > loading (not just ROM patching) and if the controller has the firmware
> > somehow already loaded, but then looses power and needs a reload, then
> > it is not cached (since it was never requested in the first place).

You mean that on resume there may be a requirement to load a different file
than on boot? If so you can just request it proactively at boot.  Note that the
cache is associated for the device with devres, and even if you call
release_firmware() on the firmware it will be cached for you.  The firmware
cache works with the devres entry, and the firmware devres entry is maintained
throughout the lifetime of the device. This means that even if you
release_firmware() the firmware cache will still be used on resume from
suspend.

> > Of course if the reload triggers during resume when not file system is
> > available, it can not request the firmware. That will just fail and
> > thus you might see this issue.

One of the designs of the firmware cache is to circumvent these concerns,
so you just want to request what you need for suspend early on in the
boot cycle, the firmware API will do what it needs for you.

Unfortunately a lot of this was just not well documented, I hoped that
recent love in this are would have helped.

> > We have a few RFC patches on the
> > mailing list that I have to review. It is however not that easy all
> > the time to find the right firmware file (at least not for Intel
> > hardware) since the boot loader provides different information than
> > the fully operational firmware. So I need to make sure that request
> > the right firmware (if we do not need it initially) so that it gets
> > cached.
> 
> To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
> queued it up for the next 4.13-stable release as well.

Commit 81f95076281f ("firmware: add sanity check on shutdown/suspend") may
seem kludgy but the reason for it was to cleanup the horrible forced and
required UMH lock even when the UMH lock was *not* even needed, which was later
removed via commit 06a45a93e7d34aa ("firmware: move umh try locks into the umh
code").

Removing the old UMH lock even when the UMH lock was *not* needed was the right
thing to do but commit 81f95076281f (("firmware: add sanity check on
shutdown/suspend") was put in place as a safe guard as the lock was also
placing an implicit sanity check on the API. It ensures the API with the cache
was used as designed, otherwise you do run the risk of *not getting the
firmware you may need* -- Marcel seems to acknowledge this possibility.

It may be possible for us to already have in place safeguards so that upon
resume we are ensuring the path to the firmware *is* available, so IMHO we
should remove this *iff* we can provide this guarantee.  Otherwise the check is
valid. You see, although the UMH lock was bogus, it did implicitly ask the
question: is it safe for *any* helper to run and make assumptions on the
filesystem then?

In lieu of this question being answered the warning is valid given the design
of the firmware API and the having the cache available as a measure to resolve
this race.

  Luis


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 07:12:44AM +0200, Marcel Holtmann wrote:
> Hi Linus,
> 
> >> Yes 81f95076281f is to blame.. After reverting it all is fine again.
> >> 
> >> 15 resume cycles on the one laptop , 10 on the other without to hit the
> >> trace.
> > 
> > Yeah, I think that disable/enable_firmware in the suspend/resume path
> > is basically just completely random code. There  is nothing that
> > serializes with anything else, and it has no actual basis for saying
> > "I am now disabled/enabled" except for some entirely random time of
> > whenever the suspend/resume callbacks happen.
> > 
> > I'm going to revert it.
> > 
> > I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> > just didn't notice, because I didn't use bluetooth and I wasn't
> > traveling. I test my laptop every release, but I don't necessarily
> > always _use_ it.
> 
> we have a bug report that might go into the similar direction. So the
> problem is that modern Bluetooth controller require full firmware
> loading (not just ROM patching) and if the controller has the firmware
> somehow already loaded, but then looses power and needs a reload, then
> it is not cached (since it was never requested in the first place).
> 
> Of course if the reload triggers during resume when not file system is
> available, it can not request the firmware. That will just fail and
> thus you might see this issue. We have a few RFC patches on the
> mailing list that I have to review. It is however not that easy all
> the time to find the right firmware file (at least not for Intel
> hardware) since the boot loader provides different information than
> the fully operational firmware. So I need to make sure that request
> the right firmware (if we do not need it initially) so that it gets
> cached.

To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
queued it up for the next 4.13-stable release as well.

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-11 Thread Greg Kroah-Hartman
On Mon, Sep 11, 2017 at 07:12:44AM +0200, Marcel Holtmann wrote:
> Hi Linus,
> 
> >> Yes 81f95076281f is to blame.. After reverting it all is fine again.
> >> 
> >> 15 resume cycles on the one laptop , 10 on the other without to hit the
> >> trace.
> > 
> > Yeah, I think that disable/enable_firmware in the suspend/resume path
> > is basically just completely random code. There  is nothing that
> > serializes with anything else, and it has no actual basis for saying
> > "I am now disabled/enabled" except for some entirely random time of
> > whenever the suspend/resume callbacks happen.
> > 
> > I'm going to revert it.
> > 
> > I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> > just didn't notice, because I didn't use bluetooth and I wasn't
> > traveling. I test my laptop every release, but I don't necessarily
> > always _use_ it.
> 
> we have a bug report that might go into the similar direction. So the
> problem is that modern Bluetooth controller require full firmware
> loading (not just ROM patching) and if the controller has the firmware
> somehow already loaded, but then looses power and needs a reload, then
> it is not cached (since it was never requested in the first place).
> 
> Of course if the reload triggers during resume when not file system is
> available, it can not request the firmware. That will just fail and
> thus you might see this issue. We have a few RFC patches on the
> mailing list that I have to review. It is however not that easy all
> the time to find the right firmware file (at least not for Intel
> hardware) since the boot loader provides different information than
> the fully operational firmware. So I need to make sure that request
> the right firmware (if we do not need it initially) so that it gets
> cached.

To confirm, reverting this fixes the problem I was seeing in 4.13.  I've
queued it up for the next 4.13-stable release as well.

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Marcel Holtmann
Hi Linus,

>> Yes 81f95076281f is to blame.. After reverting it all is fine again.
>> 
>> 15 resume cycles on the one laptop , 10 on the other without to hit the
>> trace.
> 
> Yeah, I think that disable/enable_firmware in the suspend/resume path
> is basically just completely random code. There  is nothing that
> serializes with anything else, and it has no actual basis for saying
> "I am now disabled/enabled" except for some entirely random time of
> whenever the suspend/resume callbacks happen.
> 
> I'm going to revert it.
> 
> I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> just didn't notice, because I didn't use bluetooth and I wasn't
> traveling. I test my laptop every release, but I don't necessarily
> always _use_ it.

we have a bug report that might go into the similar direction. So the problem 
is that modern Bluetooth controller require full firmware loading (not just ROM 
patching) and if the controller has the firmware somehow already loaded, but 
then looses power and needs a reload, then it is not cached (since it was never 
requested in the first place).

Of course if the reload triggers during resume when not file system is 
available, it can not request the firmware. That will just fail and thus you 
might see this issue. We have a few RFC patches on the mailing list that I have 
to review. It is however not that easy all the time to find the right firmware 
file (at least not for Intel hardware) since the boot loader provides different 
information than the fully operational firmware. So I need to make sure that 
request the right firmware (if we do not need it initially) so that it gets 
cached.

Regards

Marcel



Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Marcel Holtmann
Hi Linus,

>> Yes 81f95076281f is to blame.. After reverting it all is fine again.
>> 
>> 15 resume cycles on the one laptop , 10 on the other without to hit the
>> trace.
> 
> Yeah, I think that disable/enable_firmware in the suspend/resume path
> is basically just completely random code. There  is nothing that
> serializes with anything else, and it has no actual basis for saying
> "I am now disabled/enabled" except for some entirely random time of
> whenever the suspend/resume callbacks happen.
> 
> I'm going to revert it.
> 
> I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
> just didn't notice, because I didn't use bluetooth and I wasn't
> traveling. I test my laptop every release, but I don't necessarily
> always _use_ it.

we have a bug report that might go into the similar direction. So the problem 
is that modern Bluetooth controller require full firmware loading (not just ROM 
patching) and if the controller has the firmware somehow already loaded, but 
then looses power and needs a reload, then it is not cached (since it was never 
requested in the first place).

Of course if the reload triggers during resume when not file system is 
available, it can not request the firmware. That will just fail and thus you 
might see this issue. We have a few RFC patches on the mailing list that I have 
to review. It is however not that easy all the time to find the right firmware 
file (at least not for Intel hardware) since the boot loader provides different 
information than the fully operational firmware. So I need to make sure that 
request the right firmware (if we do not need it initially) so that it gets 
cached.

Regards

Marcel



Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Linus Torvalds
On Sun, Sep 10, 2017 at 8:49 PM, Gabriel C  wrote:
>
> Yes 81f95076281f is to blame.. After reverting it all is fine again.
>
> 15 resume cycles on the one laptop , 10 on the other without to hit the
> trace.

Yeah, I think that disable/enable_firmware in the suspend/resume path
is basically just completely random code. There  is nothing that
serializes with anything else, and it has no actual basis for saying
"I am now disabled/enabled" except for some entirely random time of
whenever the suspend/resume callbacks happen.

I'm going to revert it.

I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
just didn't notice, because I didn't use bluetooth and I wasn't
traveling. I test my laptop every release, but I don't necessarily
always _use_ it.

   Linus


Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Linus Torvalds
On Sun, Sep 10, 2017 at 8:49 PM, Gabriel C  wrote:
>
> Yes 81f95076281f is to blame.. After reverting it all is fine again.
>
> 15 resume cycles on the one laptop , 10 on the other without to hit the
> trace.

Yeah, I think that disable/enable_firmware in the suspend/resume path
is basically just completely random code. There  is nothing that
serializes with anything else, and it has no actual basis for saying
"I am now disabled/enabled" except for some entirely random time of
whenever the suspend/resume callbacks happen.

I'm going to revert it.

I wonder why 4.13 seemed to work for me. Or maybe it didn't, and I
just didn't notice, because I didn't use bluetooth and I wasn't
traveling. I test my laptop every release, but I don't necessarily
always _use_ it.

   Linus


Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Gabriel C

On 11.09.2017 05:15, Gabriel C wrote:

On 11.09.2017 03:25, Greg Kroah-Hartman wrote:

On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:

This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).


Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



I see the same problem with QCA hardware.. but a bit different.

On first resume cycle the firmware call is fine but the adapter dies a bit 
later with :


  'Bluetooth: hci0: Failed to send body at 4 of 1857 (-110)'

On second resume cycle I hit the trace too.






  Linus

--

   ACPI: Low-level resume complete
   ACPI: EC: EC started
   PM: Restoring platform NVS memory
   Enabling non-boot CPUs ...
   x86: Booting SMP configuration:
   smpboot: Booting Node 0 Processor 1 APIC 0x2
    cache: parent cpu1 should not be sleeping
   CPU1 is up
   smpboot: Booting Node 0 Processor 2 APIC 0x1
    cache: parent cpu2 should not be sleeping
   CPU2 is up
   smpboot: Booting Node 0 Processor 3 APIC 0x3
    cache: parent cpu3 should not be sleeping
   CPU3 is up
   ACPI: Waking up from system sleep state S3
   ACPI: EC: event unblocked
   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
   usb 1-3:1.0: rebind failed: -517
   usb 1-3:1.1: rebind failed: -517
   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
   OOM killer enabled.
   Restarting tasks ...
   Bluetooth: hci0: Device revision is 5
   Bluetooth: hci0: Secure boot is enabled
   Bluetooth: hci0: OTP lock is enabled
   Bluetooth: hci0: API lock is enabled
   Bluetooth: hci0: Debug lock is disabled
   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
   firmware request while host is not available
   [ cut here ]
   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
   Workqueue: hci0 hci_power_on [bluetooth]
   task: 8d3767895ac0 task.stack: 9d3481efc000
   RIP: 0010:_request_firmware+0x460/0x790
   Call Trace:
    request_firmware+0x37/0x50
    btusb_setup_intel_new+0x227/0x7e0 [btusb]
    hci_dev_do_open+0x3da/0x570 [bluetooth]
    hci_power_on+0x52/0x1f0 [bluetooth]
    process_one_work+0x1db/0x3d0
    worker_thread+0x47/0x3e0
    kthread+0x125/0x140
    ret_from_fork+0x22/0x30
   ---[ end trace 007b222491432927 ]---
   Bluetooth: hci0: Failed to load Intel firmware file (-112)
   [drm] RC6 on
   done.
   thermal thermal_zone11: failed to read out thermal zone (-5)
   PM: suspend exit


Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.



I can revert it an fire up some testing..



Yes 81f95076281f is to blame.. After reverting it all is fine again.

15 resume cycles on the one laptop , 10 on the other without to hit the trace.

Regards




Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Gabriel C

On 11.09.2017 05:15, Gabriel C wrote:

On 11.09.2017 03:25, Greg Kroah-Hartman wrote:

On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:

This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).


Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



I see the same problem with QCA hardware.. but a bit different.

On first resume cycle the firmware call is fine but the adapter dies a bit 
later with :


  'Bluetooth: hci0: Failed to send body at 4 of 1857 (-110)'

On second resume cycle I hit the trace too.






  Linus

--

   ACPI: Low-level resume complete
   ACPI: EC: EC started
   PM: Restoring platform NVS memory
   Enabling non-boot CPUs ...
   x86: Booting SMP configuration:
   smpboot: Booting Node 0 Processor 1 APIC 0x2
    cache: parent cpu1 should not be sleeping
   CPU1 is up
   smpboot: Booting Node 0 Processor 2 APIC 0x1
    cache: parent cpu2 should not be sleeping
   CPU2 is up
   smpboot: Booting Node 0 Processor 3 APIC 0x3
    cache: parent cpu3 should not be sleeping
   CPU3 is up
   ACPI: Waking up from system sleep state S3
   ACPI: EC: event unblocked
   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
   usb 1-3:1.0: rebind failed: -517
   usb 1-3:1.1: rebind failed: -517
   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
   OOM killer enabled.
   Restarting tasks ...
   Bluetooth: hci0: Device revision is 5
   Bluetooth: hci0: Secure boot is enabled
   Bluetooth: hci0: OTP lock is enabled
   Bluetooth: hci0: API lock is enabled
   Bluetooth: hci0: Debug lock is disabled
   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
   firmware request while host is not available
   [ cut here ]
   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
   Workqueue: hci0 hci_power_on [bluetooth]
   task: 8d3767895ac0 task.stack: 9d3481efc000
   RIP: 0010:_request_firmware+0x460/0x790
   Call Trace:
    request_firmware+0x37/0x50
    btusb_setup_intel_new+0x227/0x7e0 [btusb]
    hci_dev_do_open+0x3da/0x570 [bluetooth]
    hci_power_on+0x52/0x1f0 [bluetooth]
    process_one_work+0x1db/0x3d0
    worker_thread+0x47/0x3e0
    kthread+0x125/0x140
    ret_from_fork+0x22/0x30
   ---[ end trace 007b222491432927 ]---
   Bluetooth: hci0: Failed to load Intel firmware file (-112)
   [drm] RC6 on
   done.
   thermal thermal_zone11: failed to read out thermal zone (-5)
   PM: suspend exit


Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.



I can revert it an fire up some testing..



Yes 81f95076281f is to blame.. After reverting it all is fine again.

15 resume cycles on the one laptop , 10 on the other without to hit the trace.

Regards




Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Gabriel C

On 11.09.2017 03:25, Greg Kroah-Hartman wrote:

On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:

This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).


Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



I see the same problem with QCA hardware.. but a bit different.

On first resume cycle the firmware call is fine but the adapter dies a bit 
later with :


 'Bluetooth: hci0: Failed to send body at 4 of 1857 (-110)'

On second resume cycle I hit the trace too.






  Linus

--

   ACPI: Low-level resume complete
   ACPI: EC: EC started
   PM: Restoring platform NVS memory
   Enabling non-boot CPUs ...
   x86: Booting SMP configuration:
   smpboot: Booting Node 0 Processor 1 APIC 0x2
cache: parent cpu1 should not be sleeping
   CPU1 is up
   smpboot: Booting Node 0 Processor 2 APIC 0x1
cache: parent cpu2 should not be sleeping
   CPU2 is up
   smpboot: Booting Node 0 Processor 3 APIC 0x3
cache: parent cpu3 should not be sleeping
   CPU3 is up
   ACPI: Waking up from system sleep state S3
   ACPI: EC: event unblocked
   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
   usb 1-3:1.0: rebind failed: -517
   usb 1-3:1.1: rebind failed: -517
   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
   OOM killer enabled.
   Restarting tasks ...
   Bluetooth: hci0: Device revision is 5
   Bluetooth: hci0: Secure boot is enabled
   Bluetooth: hci0: OTP lock is enabled
   Bluetooth: hci0: API lock is enabled
   Bluetooth: hci0: Debug lock is disabled
   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
   firmware request while host is not available
   [ cut here ]
   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
   Workqueue: hci0 hci_power_on [bluetooth]
   task: 8d3767895ac0 task.stack: 9d3481efc000
   RIP: 0010:_request_firmware+0x460/0x790
   Call Trace:
request_firmware+0x37/0x50
btusb_setup_intel_new+0x227/0x7e0 [btusb]
hci_dev_do_open+0x3da/0x570 [bluetooth]
hci_power_on+0x52/0x1f0 [bluetooth]
process_one_work+0x1db/0x3d0
worker_thread+0x47/0x3e0
kthread+0x125/0x140
ret_from_fork+0x22/0x30
   ---[ end trace 007b222491432927 ]---
   Bluetooth: hci0: Failed to load Intel firmware file (-112)
   [drm] RC6 on
   done.
   thermal thermal_zone11: failed to read out thermal zone (-5)
   PM: suspend exit


Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.



I can revert it an fire up some testing..

This time with 4.13.x I hit all kind bugs on this box anyway :)




Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Gabriel C

On 11.09.2017 03:25, Greg Kroah-Hartman wrote:

On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:

This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).


Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



I see the same problem with QCA hardware.. but a bit different.

On first resume cycle the firmware call is fine but the adapter dies a bit 
later with :


 'Bluetooth: hci0: Failed to send body at 4 of 1857 (-110)'

On second resume cycle I hit the trace too.






  Linus

--

   ACPI: Low-level resume complete
   ACPI: EC: EC started
   PM: Restoring platform NVS memory
   Enabling non-boot CPUs ...
   x86: Booting SMP configuration:
   smpboot: Booting Node 0 Processor 1 APIC 0x2
cache: parent cpu1 should not be sleeping
   CPU1 is up
   smpboot: Booting Node 0 Processor 2 APIC 0x1
cache: parent cpu2 should not be sleeping
   CPU2 is up
   smpboot: Booting Node 0 Processor 3 APIC 0x3
cache: parent cpu3 should not be sleeping
   CPU3 is up
   ACPI: Waking up from system sleep state S3
   ACPI: EC: event unblocked
   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
   usb 1-3:1.0: rebind failed: -517
   usb 1-3:1.1: rebind failed: -517
   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
   OOM killer enabled.
   Restarting tasks ...
   Bluetooth: hci0: Device revision is 5
   Bluetooth: hci0: Secure boot is enabled
   Bluetooth: hci0: OTP lock is enabled
   Bluetooth: hci0: API lock is enabled
   Bluetooth: hci0: Debug lock is disabled
   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
   firmware request while host is not available
   [ cut here ]
   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
   Workqueue: hci0 hci_power_on [bluetooth]
   task: 8d3767895ac0 task.stack: 9d3481efc000
   RIP: 0010:_request_firmware+0x460/0x790
   Call Trace:
request_firmware+0x37/0x50
btusb_setup_intel_new+0x227/0x7e0 [btusb]
hci_dev_do_open+0x3da/0x570 [bluetooth]
hci_power_on+0x52/0x1f0 [bluetooth]
process_one_work+0x1db/0x3d0
worker_thread+0x47/0x3e0
kthread+0x125/0x140
ret_from_fork+0x22/0x30
   ---[ end trace 007b222491432927 ]---
   Bluetooth: hci0: Failed to load Intel firmware file (-112)
   [drm] RC6 on
   done.
   thermal thermal_zone11: failed to read out thermal zone (-5)
   PM: suspend exit


Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.



I can revert it an fire up some testing..

This time with 4.13.x I hit all kind bugs on this box anyway :)




Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Greg Kroah-Hartman
On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:
> This seems to be a new problem at resume for the Intel btusb driver,
> but I'm not seeing anything in that driver itself that looks like a
> likely trigger, so I wonder if it's some driver core change, a generic
> resume path issue, or a workqueue change that has made it trigger for
> me.
> 
> It might also just be a timing difference, maybe it's always been there?
> 
> Does anybody have any ideas? It does't happen on every resume, and the
> machine works despite this (but no bluetooth - the *next* resume might
> bring it back, though).

Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



> 
>  Linus
> 
> --
> 
>   ACPI: Low-level resume complete
>   ACPI: EC: EC started
>   PM: Restoring platform NVS memory
>   Enabling non-boot CPUs ...
>   x86: Booting SMP configuration:
>   smpboot: Booting Node 0 Processor 1 APIC 0x2
>cache: parent cpu1 should not be sleeping
>   CPU1 is up
>   smpboot: Booting Node 0 Processor 2 APIC 0x1
>cache: parent cpu2 should not be sleeping
>   CPU2 is up
>   smpboot: Booting Node 0 Processor 3 APIC 0x3
>cache: parent cpu3 should not be sleeping
>   CPU3 is up
>   ACPI: Waking up from system sleep state S3
>   ACPI: EC: event unblocked
>   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
>   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
>   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
>   usb 1-3:1.0: rebind failed: -517
>   usb 1-3:1.1: rebind failed: -517
>   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
>   OOM killer enabled.
>   Restarting tasks ...
>   Bluetooth: hci0: Device revision is 5
>   Bluetooth: hci0: Secure boot is enabled
>   Bluetooth: hci0: OTP lock is enabled
>   Bluetooth: hci0: API lock is enabled
>   Bluetooth: hci0: Debug lock is disabled
>   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
>   firmware request while host is not available
>   [ cut here ]
>   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
> _request_firmware+0x460/0x790
>   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 
> #11
>   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
>   Workqueue: hci0 hci_power_on [bluetooth]
>   task: 8d3767895ac0 task.stack: 9d3481efc000
>   RIP: 0010:_request_firmware+0x460/0x790
>   Call Trace:
>request_firmware+0x37/0x50
>btusb_setup_intel_new+0x227/0x7e0 [btusb]
>hci_dev_do_open+0x3da/0x570 [bluetooth]
>hci_power_on+0x52/0x1f0 [bluetooth]
>process_one_work+0x1db/0x3d0
>worker_thread+0x47/0x3e0
>kthread+0x125/0x140
>ret_from_fork+0x22/0x30
>   ---[ end trace 007b222491432927 ]---
>   Bluetooth: hci0: Failed to load Intel firmware file (-112)
>   [drm] RC6 on
>   done.
>   thermal thermal_zone11: failed to read out thermal zone (-5)
>   PM: suspend exit

Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.

thanks,

greg k-h


Re: btusb "firmware request while host is not available" at resume

2017-09-10 Thread Greg Kroah-Hartman
On Sun, Sep 10, 2017 at 12:26:02PM -0700, Linus Torvalds wrote:
> This seems to be a new problem at resume for the Intel btusb driver,
> but I'm not seeing anything in that driver itself that looks like a
> likely trigger, so I wonder if it's some driver core change, a generic
> resume path issue, or a workqueue change that has made it trigger for
> me.
> 
> It might also just be a timing difference, maybe it's always been there?
> 
> Does anybody have any ideas? It does't happen on every resume, and the
> machine works despite this (but no bluetooth - the *next* resume might
> bring it back, though).

Ah, it's not just me having this problem.  I don't see it happening in
4.12, and haven't had the time to bisect it.  I seem to be able to
trigger it every suspend/resume cycle, so I don't know if it's a timing
issue.



> 
>  Linus
> 
> --
> 
>   ACPI: Low-level resume complete
>   ACPI: EC: EC started
>   PM: Restoring platform NVS memory
>   Enabling non-boot CPUs ...
>   x86: Booting SMP configuration:
>   smpboot: Booting Node 0 Processor 1 APIC 0x2
>cache: parent cpu1 should not be sleeping
>   CPU1 is up
>   smpboot: Booting Node 0 Processor 2 APIC 0x1
>cache: parent cpu2 should not be sleeping
>   CPU2 is up
>   smpboot: Booting Node 0 Processor 3 APIC 0x3
>cache: parent cpu3 should not be sleeping
>   CPU3 is up
>   ACPI: Waking up from system sleep state S3
>   ACPI: EC: event unblocked
>   usb 1-3: reset full-speed USB device number 2 using xhci_hcd
>   usb 1-4: reset full-speed USB device number 3 using xhci_hcd
>   usb 1-5: reset high-speed USB device number 4 using xhci_hcd
>   usb 1-3:1.0: rebind failed: -517
>   usb 1-3:1.1: rebind failed: -517
>   Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
>   OOM killer enabled.
>   Restarting tasks ...
>   Bluetooth: hci0: Device revision is 5
>   Bluetooth: hci0: Secure boot is enabled
>   Bluetooth: hci0: OTP lock is enabled
>   Bluetooth: hci0: API lock is enabled
>   Bluetooth: hci0: Debug lock is disabled
>   Bluetooth: hci0: Minimum firmware build 1 week 10 2014
>   firmware request while host is not available
>   [ cut here ]
>   WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
> _request_firmware+0x460/0x790
>   CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 
> #11
>   Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
>   Workqueue: hci0 hci_power_on [bluetooth]
>   task: 8d3767895ac0 task.stack: 9d3481efc000
>   RIP: 0010:_request_firmware+0x460/0x790
>   Call Trace:
>request_firmware+0x37/0x50
>btusb_setup_intel_new+0x227/0x7e0 [btusb]
>hci_dev_do_open+0x3da/0x570 [bluetooth]
>hci_power_on+0x52/0x1f0 [bluetooth]
>process_one_work+0x1db/0x3d0
>worker_thread+0x47/0x3e0
>kthread+0x125/0x140
>ret_from_fork+0x22/0x30
>   ---[ end trace 007b222491432927 ]---
>   Bluetooth: hci0: Failed to load Intel firmware file (-112)
>   [drm] RC6 on
>   done.
>   thermal thermal_zone11: failed to read out thermal zone (-5)
>   PM: suspend exit

Ah, I'll blame Luis for this, I think it might be due to 81f95076281f
("firmware: add sanity check on shutdown/suspend")

Luis, any ideas?  I'll try to revert this and try it out tomorrow when
I get a chance.

thanks,

greg k-h


btusb "firmware request while host is not available" at resume

2017-09-10 Thread Linus Torvalds
This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).

 Linus

--

  ACPI: Low-level resume complete
  ACPI: EC: EC started
  PM: Restoring platform NVS memory
  Enabling non-boot CPUs ...
  x86: Booting SMP configuration:
  smpboot: Booting Node 0 Processor 1 APIC 0x2
   cache: parent cpu1 should not be sleeping
  CPU1 is up
  smpboot: Booting Node 0 Processor 2 APIC 0x1
   cache: parent cpu2 should not be sleeping
  CPU2 is up
  smpboot: Booting Node 0 Processor 3 APIC 0x3
   cache: parent cpu3 should not be sleeping
  CPU3 is up
  ACPI: Waking up from system sleep state S3
  ACPI: EC: event unblocked
  usb 1-3: reset full-speed USB device number 2 using xhci_hcd
  usb 1-4: reset full-speed USB device number 3 using xhci_hcd
  usb 1-5: reset high-speed USB device number 4 using xhci_hcd
  usb 1-3:1.0: rebind failed: -517
  usb 1-3:1.1: rebind failed: -517
  Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
  OOM killer enabled.
  Restarting tasks ...
  Bluetooth: hci0: Device revision is 5
  Bluetooth: hci0: Secure boot is enabled
  Bluetooth: hci0: OTP lock is enabled
  Bluetooth: hci0: API lock is enabled
  Bluetooth: hci0: Debug lock is disabled
  Bluetooth: hci0: Minimum firmware build 1 week 10 2014
  firmware request while host is not available
  [ cut here ]
  WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
  CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
  Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
  Workqueue: hci0 hci_power_on [bluetooth]
  task: 8d3767895ac0 task.stack: 9d3481efc000
  RIP: 0010:_request_firmware+0x460/0x790
  Call Trace:
   request_firmware+0x37/0x50
   btusb_setup_intel_new+0x227/0x7e0 [btusb]
   hci_dev_do_open+0x3da/0x570 [bluetooth]
   hci_power_on+0x52/0x1f0 [bluetooth]
   process_one_work+0x1db/0x3d0
   worker_thread+0x47/0x3e0
   kthread+0x125/0x140
   ret_from_fork+0x22/0x30
  ---[ end trace 007b222491432927 ]---
  Bluetooth: hci0: Failed to load Intel firmware file (-112)
  [drm] RC6 on
  done.
  thermal thermal_zone11: failed to read out thermal zone (-5)
  PM: suspend exit


btusb "firmware request while host is not available" at resume

2017-09-10 Thread Linus Torvalds
This seems to be a new problem at resume for the Intel btusb driver,
but I'm not seeing anything in that driver itself that looks like a
likely trigger, so I wonder if it's some driver core change, a generic
resume path issue, or a workqueue change that has made it trigger for
me.

It might also just be a timing difference, maybe it's always been there?

Does anybody have any ideas? It does't happen on every resume, and the
machine works despite this (but no bluetooth - the *next* resume might
bring it back, though).

 Linus

--

  ACPI: Low-level resume complete
  ACPI: EC: EC started
  PM: Restoring platform NVS memory
  Enabling non-boot CPUs ...
  x86: Booting SMP configuration:
  smpboot: Booting Node 0 Processor 1 APIC 0x2
   cache: parent cpu1 should not be sleeping
  CPU1 is up
  smpboot: Booting Node 0 Processor 2 APIC 0x1
   cache: parent cpu2 should not be sleeping
  CPU2 is up
  smpboot: Booting Node 0 Processor 3 APIC 0x3
   cache: parent cpu3 should not be sleeping
  CPU3 is up
  ACPI: Waking up from system sleep state S3
  ACPI: EC: event unblocked
  usb 1-3: reset full-speed USB device number 2 using xhci_hcd
  usb 1-4: reset full-speed USB device number 3 using xhci_hcd
  usb 1-5: reset high-speed USB device number 4 using xhci_hcd
  usb 1-3:1.0: rebind failed: -517
  usb 1-3:1.1: rebind failed: -517
  Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
  OOM killer enabled.
  Restarting tasks ...
  Bluetooth: hci0: Device revision is 5
  Bluetooth: hci0: Secure boot is enabled
  Bluetooth: hci0: OTP lock is enabled
  Bluetooth: hci0: API lock is enabled
  Bluetooth: hci0: Debug lock is disabled
  Bluetooth: hci0: Minimum firmware build 1 week 10 2014
  firmware request while host is not available
  [ cut here ]
  WARNING: CPU: 2 PID: 621 at drivers/base/firmware_class.c:1250
_request_firmware+0x460/0x790
  CPU: 2 PID: 621 Comm: kworker/u9:2 Not tainted 4.13.0-10313-ge860d2c904d1 #11
  Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
  Workqueue: hci0 hci_power_on [bluetooth]
  task: 8d3767895ac0 task.stack: 9d3481efc000
  RIP: 0010:_request_firmware+0x460/0x790
  Call Trace:
   request_firmware+0x37/0x50
   btusb_setup_intel_new+0x227/0x7e0 [btusb]
   hci_dev_do_open+0x3da/0x570 [bluetooth]
   hci_power_on+0x52/0x1f0 [bluetooth]
   process_one_work+0x1db/0x3d0
   worker_thread+0x47/0x3e0
   kthread+0x125/0x140
   ret_from_fork+0x22/0x30
  ---[ end trace 007b222491432927 ]---
  Bluetooth: hci0: Failed to load Intel firmware file (-112)
  [drm] RC6 on
  done.
  thermal thermal_zone11: failed to read out thermal zone (-5)
  PM: suspend exit