Re: switcheroo registration vs switching race...

2012-12-05 Thread Daniel J Blueman
On 5 December 2012 00:04, Takashi Iwai  wrote:
> At Tue, 4 Dec 2012 23:54:39 +0800,
> Daniel J Blueman wrote:
>>
>> On 4 December 2012 23:03, Takashi Iwai  wrote:
>> > At Tue, 4 Dec 2012 22:46:47 +0800,
>> > Daniel J Blueman wrote:
>> >>
>> >> On 4 December 2012 21:55, Takashi Iwai  wrote:
>> >> > At Tue, 04 Dec 2012 14:23:05 +0100,
>> >> > Takashi Iwai wrote:
>> >> >>
>> >> >> At Tue, 4 Dec 2012 20:58:55 +0800,
>> >> >> Daniel J Blueman wrote:
>> >> >> >
>> >> >> > On 4 December 2012 01:10, Takashi Iwai  wrote:
>> >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800,
>> >> >> > > Daniel J Blueman wrote:
>> >> >> > >>
>> >> >> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
>> >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
>> >> >> > >> > Daniel J Blueman wrote:
>> >> >> > >> >>
>> >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
>> >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
>> >> >> > >> >> > Daniel J Blueman wrote:
>> >> >> > >> >> >>
>> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  
>> >> >> > >> >> >> wrote:
>> >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> >> >> > >> >> >> > Takashi Iwai wrote:
>> >> >> > >> >> >> >>
>> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> >> > >> >> >> >> Daniel J Blueman wrote:
>> >> >> > >> >> >> >> >
>> >> >> > >> >> >> >> > Hi Seth, Dave, Takashi,
>> >> >> > >> >> >> >> >
>> >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm 
>> >> >> > >> >> >> >> > starts by
>> >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart 
>> >> >> > >> >> >> >> > script, I see a race
>> >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller 
>> >> >> > >> >> >> >> > timing out to
>> >> >> > >> >> >> >> > commands [2].
>> >> >> > >> >> >> >> >
>> >> >> > >> >> >> >> > Adding some debug, I see that the registered audio 
>> >> >> > >> >> >> >> > devices are put
>> >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the 
>> >> >> > >> >> >> >> > discrete (and
>> >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit 
>> >> >> > >> >> >> >> > later, so the
>> >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver 
>> >> >> > >> >> >> >> > it's talking to
>> >> >> > >> >> >> >> > hardware which is now in D3.
>> >> >> > >> >> >> >> >
>> >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait 
>> >> >> > >> >> >> >> > for the DGPU HDA
>> >> >> > >> >> >> >> > controller, but perhaps this should be solved at a 
>> >> >> > >> >> >> >> > higher level in the
>> >> >> > >> >> >> >> > vgaswitcheroo code; what do you think?
>> >> >> > >> >> >> >>
>> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix 
>> >> >> > >> >> >> >> another race in
>> >> >> > >> >> >> >> the probe.  A part of them problem is that the 
>> >> >> > >> >> >> >> registration is done at
>> >> >> > >> >> >> >> the very last of probing.
>> >> >> > >> >> >> >>
>> >> >> > >> >> >> >> Instead of delaying the registration, how about the 
>> >> >> > >> >> >> >> patch below?
>> >> >> > >> >> >> >
>> >> >> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 
>> >> >> > >> >> >> > merge, at
>> >> >> > >> >> >> > least...
>> >> >> > >> >> >>
>> >> >> > >> >> >> Ping ack; I was trying to find time to understand another 
>> >> >> > >> >> >> race that
>> >> >> > >> >> >> occurs with GPU probing after switching, but is separate 
>> >> >> > >> >> >> from the
>> >> >> > >> >> >> situation before switching, here.
>> >> >> > >> >> >>
>> >> >> > >> >> >> In the context of writing the switch, it looks like struct 
>> >> >> > >> >> >> azx isn't
>> >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; 
>> >> >> > >> >> >> racing with
>> >> >> > >> >> >> azx_codec_create?
>> >> >> > >> >> >
>> >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci 
>> >> >> > >> >> > drvdata.
>> >> >> > >> >> >
>> >> >> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() 
>> >> >> > >> >> > before
>> >> >> > >> >> > register_vga_switcheroo().  Could you retest with it?
>> >> >> > >> >>
>> >> >> > >> >> Superb; this addresses the oops.
>> >> >> > >> >
>> >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to 
>> >> >> > >> > stable.
>> >> >> > >> >
>> >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see 
>> >> >> > >> >> "hda-intel:
>> >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
>> >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt
>> >> >> > >> >
>> >> >> > >> > Hm, it's not clear who triggers these messages.  I'll try to 
>> >> >> > >> > check the
>> >> >> > >> > code paths.
>> >> >> > >> >
>> >> >> > >> >> Presumably this implies the read of the ring-buffer pointer 
>> >> >> > >> >> returned
>> >> >> > >> >> 0x, so the HDA driver understands the pointer to have 
>> >> >> > >> >> wrapped
>> >> >> > >> >> and processes the 191 

Re: switcheroo registration vs switching race...

2012-12-05 Thread Daniel J Blueman
On 5 December 2012 00:04, Takashi Iwai ti...@suse.de wrote:
 At Tue, 4 Dec 2012 23:54:39 +0800,
 Daniel J Blueman wrote:

 On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote:
  At Tue, 4 Dec 2012 22:46:47 +0800,
  Daniel J Blueman wrote:
 
  On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote:
   At Tue, 04 Dec 2012 14:23:05 +0100,
   Takashi Iwai wrote:
  
   At Tue, 4 Dec 2012 20:58:55 +0800,
   Daniel J Blueman wrote:
   
On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
 At Tue, 4 Dec 2012 00:50:56 +0800,
 Daniel J Blueman wrote:

 On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 23:08:28 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
   At Mon, 3 Dec 2012 22:25:52 +0800,
   Daniel J Blueman wrote:
  
   On 3 December 2012 19:17, Takashi Iwai ti...@suse.de 
   wrote:
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
   
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:

 Hi Seth, Dave, Takashi,

 If I power down the unused discrete GPU before lightdm 
 starts by
 fiddling with the sysfs file [1] in the upstart 
 script, I see a race
 manifesting as the discrete GPU's HDA controller 
 timing out to
 commands [2].

 Adding some debug, I see that the registered audio 
 devices are put
 into D3 before the GPU is, but it turns out that the 
 discrete (and
 internal) GPU's HDA controller gets registered a bit 
 later, so the
 list is empty. The symptom is since the HDA driver 
 it's talking to
 hardware which is now in D3.

 We could add a mutex to nouveau to allow us to wait 
 for the DGPU HDA
 controller, but perhaps this should be solved at a 
 higher level in the
 vgaswitcheroo code; what do you think?
   
Maybe it's a side effect for the recent effort to fix 
another race in
the probe.  A part of them problem is that the 
registration is done at
the very last of probing.
   
Instead of delaying the registration, how about the 
patch below?
   
Ping.  If this really works, I'd like to queue it for 3.8 
merge, at
least...
  
   Ping ack; I was trying to find time to understand another 
   race that
   occurs with GPU probing after switching, but is separate 
   from the
   situation before switching, here.
  
   In the context of writing the switch, it looks like struct 
   azx isn't
   allocated by the time azx_vs_set_state accesses it [1,2]; 
   racing with
   azx_codec_create?
  
   It was allocated, but it wasn't assigned properly in pci 
   drvdata.
  
   Below is the revised patch.  Just moved pci_set_drvdata() 
   before
   register_vga_switcheroo().  Could you retest with it?
 
  Superb; this addresses the oops.
 
  OK, I'll queue it to sound tree for 3.8 kernel with Cc to 
  stable.
 
  ~1 second after the DGPU is put into D3, I still often see 
  hda-intel:
  spurious response 0x0:0x0, last cmd=0x170500:
  http://quora.org/2012/hda-switch-spurious.txt
 
  Hm, it's not clear who triggers these messages.  I'll try to 
  check the
  code paths.
 
  Presumably this implies the read of the ring-buffer pointer 
  returned
  0x, so the HDA driver understands the pointer to have 
  wrapped
  and processes the 191 unwritten entries?
 
  Good point.  Actually there is one bug that looks obviously 
  wrong
  (writing 32bit value to CORBWP).  Maybe it has been working just
  because writing CORBRP doesn't influence except for the reset 
  bit.
 
  Reading CORBWP as a byte is OK, but this could be better in a 
  word so
  that we can check 0x as invalid.
 
  A test patch is below.  Hopefully this improves the situation...

 I'll check this out tomorrow and also instrument the code to get a
 backtrace, since there may still be an underlying race with the
 previous patches:
   
[...]
   
 That's odd.  The Cirrus one (:00:1b.0) must be independent from
 the vga switcheroo things for Nvidia...

 The patch below is the revised patch of the first one.  Now the vga
 switcheroo registration is done before the video controller D3 
 check,
 so the race should be smaller.
   
This patch improves things further of course; a major improvement 
over
without. ~15% of the time, I still get the 'spurious response'
messages in this callpath:
  
   A possible scenario is that the graphics went in D3 before probing
   hd-audio, and the 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 23:54:39 +0800,
Daniel J Blueman wrote:
> 
> On 4 December 2012 23:03, Takashi Iwai  wrote:
> > At Tue, 4 Dec 2012 22:46:47 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 4 December 2012 21:55, Takashi Iwai  wrote:
> >> > At Tue, 04 Dec 2012 14:23:05 +0100,
> >> > Takashi Iwai wrote:
> >> >>
> >> >> At Tue, 4 Dec 2012 20:58:55 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> >
> >> >> > On 4 December 2012 01:10, Takashi Iwai  wrote:
> >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800,
> >> >> > > Daniel J Blueman wrote:
> >> >> > >>
> >> >> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
> >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
> >> >> > >> > Daniel J Blueman wrote:
> >> >> > >> >>
> >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
> >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> >> >> > >> >> > Daniel J Blueman wrote:
> >> >> > >> >> >>
> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> >> > >> >> >> > Takashi Iwai wrote:
> >> >> > >> >> >> >>
> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> > >> >> >> >> Daniel J Blueman wrote:
> >> >> > >> >> >> >> >
> >> >> > >> >> >> >> > Hi Seth, Dave, Takashi,
> >> >> > >> >> >> >> >
> >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm 
> >> >> > >> >> >> >> > starts by
> >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, 
> >> >> > >> >> >> >> > I see a race
> >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing 
> >> >> > >> >> >> >> > out to
> >> >> > >> >> >> >> > commands [2].
> >> >> > >> >> >> >> >
> >> >> > >> >> >> >> > Adding some debug, I see that the registered audio 
> >> >> > >> >> >> >> > devices are put
> >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the 
> >> >> > >> >> >> >> > discrete (and
> >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit 
> >> >> > >> >> >> >> > later, so the
> >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
> >> >> > >> >> >> >> > talking to
> >> >> > >> >> >> >> > hardware which is now in D3.
> >> >> > >> >> >> >> >
> >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for 
> >> >> > >> >> >> >> > the DGPU HDA
> >> >> > >> >> >> >> > controller, but perhaps this should be solved at a 
> >> >> > >> >> >> >> > higher level in the
> >> >> > >> >> >> >> > vgaswitcheroo code; what do you think?
> >> >> > >> >> >> >>
> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix 
> >> >> > >> >> >> >> another race in
> >> >> > >> >> >> >> the probe.  A part of them problem is that the 
> >> >> > >> >> >> >> registration is done at
> >> >> > >> >> >> >> the very last of probing.
> >> >> > >> >> >> >>
> >> >> > >> >> >> >> Instead of delaying the registration, how about the patch 
> >> >> > >> >> >> >> below?
> >> >> > >> >> >> >
> >> >> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 
> >> >> > >> >> >> > merge, at
> >> >> > >> >> >> > least...
> >> >> > >> >> >>
> >> >> > >> >> >> Ping ack; I was trying to find time to understand another 
> >> >> > >> >> >> race that
> >> >> > >> >> >> occurs with GPU probing after switching, but is separate 
> >> >> > >> >> >> from the
> >> >> > >> >> >> situation before switching, here.
> >> >> > >> >> >>
> >> >> > >> >> >> In the context of writing the switch, it looks like struct 
> >> >> > >> >> >> azx isn't
> >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; 
> >> >> > >> >> >> racing with
> >> >> > >> >> >> azx_codec_create?
> >> >> > >> >> >
> >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci 
> >> >> > >> >> > drvdata.
> >> >> > >> >> >
> >> >> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() 
> >> >> > >> >> > before
> >> >> > >> >> > register_vga_switcheroo().  Could you retest with it?
> >> >> > >> >>
> >> >> > >> >> Superb; this addresses the oops.
> >> >> > >> >
> >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> >> >> > >> >
> >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see 
> >> >> > >> >> "hda-intel:
> >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
> >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt
> >> >> > >> >
> >> >> > >> > Hm, it's not clear who triggers these messages.  I'll try to 
> >> >> > >> > check the
> >> >> > >> > code paths.
> >> >> > >> >
> >> >> > >> >> Presumably this implies the read of the ring-buffer pointer 
> >> >> > >> >> returned
> >> >> > >> >> 0x, so the HDA driver understands the pointer to have 
> >> >> > >> >> wrapped
> >> >> > >> >> and processes the 191 unwritten entries?
> >> >> > >> >
> >> >> > >> > Good point.  Actually there is one bug that looks obviously wrong
> >> >> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
> >> 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 23:03, Takashi Iwai  wrote:
> At Tue, 4 Dec 2012 22:46:47 +0800,
> Daniel J Blueman wrote:
>>
>> On 4 December 2012 21:55, Takashi Iwai  wrote:
>> > At Tue, 04 Dec 2012 14:23:05 +0100,
>> > Takashi Iwai wrote:
>> >>
>> >> At Tue, 4 Dec 2012 20:58:55 +0800,
>> >> Daniel J Blueman wrote:
>> >> >
>> >> > On 4 December 2012 01:10, Takashi Iwai  wrote:
>> >> > > At Tue, 4 Dec 2012 00:50:56 +0800,
>> >> > > Daniel J Blueman wrote:
>> >> > >>
>> >> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
>> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
>> >> > >> > Daniel J Blueman wrote:
>> >> > >> >>
>> >> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
>> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
>> >> > >> >> > Daniel J Blueman wrote:
>> >> > >> >> >>
>> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
>> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> >> > >> >> >> > Takashi Iwai wrote:
>> >> > >> >> >> >>
>> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> > >> >> >> >> Daniel J Blueman wrote:
>> >> > >> >> >> >> >
>> >> > >> >> >> >> > Hi Seth, Dave, Takashi,
>> >> > >> >> >> >> >
>> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm 
>> >> > >> >> >> >> > starts by
>> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I 
>> >> > >> >> >> >> > see a race
>> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing 
>> >> > >> >> >> >> > out to
>> >> > >> >> >> >> > commands [2].
>> >> > >> >> >> >> >
>> >> > >> >> >> >> > Adding some debug, I see that the registered audio 
>> >> > >> >> >> >> > devices are put
>> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the 
>> >> > >> >> >> >> > discrete (and
>> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit 
>> >> > >> >> >> >> > later, so the
>> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
>> >> > >> >> >> >> > talking to
>> >> > >> >> >> >> > hardware which is now in D3.
>> >> > >> >> >> >> >
>> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for 
>> >> > >> >> >> >> > the DGPU HDA
>> >> > >> >> >> >> > controller, but perhaps this should be solved at a higher 
>> >> > >> >> >> >> > level in the
>> >> > >> >> >> >> > vgaswitcheroo code; what do you think?
>> >> > >> >> >> >>
>> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix 
>> >> > >> >> >> >> another race in
>> >> > >> >> >> >> the probe.  A part of them problem is that the registration 
>> >> > >> >> >> >> is done at
>> >> > >> >> >> >> the very last of probing.
>> >> > >> >> >> >>
>> >> > >> >> >> >> Instead of delaying the registration, how about the patch 
>> >> > >> >> >> >> below?
>> >> > >> >> >> >
>> >> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 
>> >> > >> >> >> > merge, at
>> >> > >> >> >> > least...
>> >> > >> >> >>
>> >> > >> >> >> Ping ack; I was trying to find time to understand another race 
>> >> > >> >> >> that
>> >> > >> >> >> occurs with GPU probing after switching, but is separate from 
>> >> > >> >> >> the
>> >> > >> >> >> situation before switching, here.
>> >> > >> >> >>
>> >> > >> >> >> In the context of writing the switch, it looks like struct azx 
>> >> > >> >> >> isn't
>> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; 
>> >> > >> >> >> racing with
>> >> > >> >> >> azx_codec_create?
>> >> > >> >> >
>> >> > >> >> > It was allocated, but it wasn't assigned properly in pci 
>> >> > >> >> > drvdata.
>> >> > >> >> >
>> >> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
>> >> > >> >> > register_vga_switcheroo().  Could you retest with it?
>> >> > >> >>
>> >> > >> >> Superb; this addresses the oops.
>> >> > >> >
>> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
>> >> > >> >
>> >> > >> >> ~1 second after the DGPU is put into D3, I still often see 
>> >> > >> >> "hda-intel:
>> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
>> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt
>> >> > >> >
>> >> > >> > Hm, it's not clear who triggers these messages.  I'll try to check 
>> >> > >> > the
>> >> > >> > code paths.
>> >> > >> >
>> >> > >> >> Presumably this implies the read of the ring-buffer pointer 
>> >> > >> >> returned
>> >> > >> >> 0x, so the HDA driver understands the pointer to have 
>> >> > >> >> wrapped
>> >> > >> >> and processes the 191 unwritten entries?
>> >> > >> >
>> >> > >> > Good point.  Actually there is one bug that looks obviously wrong
>> >> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
>> >> > >> > because writing CORBRP doesn't influence except for the reset bit.
>> >> > >> >
>> >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word 
>> >> > >> > so
>> >> > >> > that we can check 0x as invalid.
>> >> > >> >
>> >> > >> > A test patch is below.  Hopefully 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 22:46:47 +0800,
Daniel J Blueman wrote:
> 
> On 4 December 2012 21:55, Takashi Iwai  wrote:
> > At Tue, 04 Dec 2012 14:23:05 +0100,
> > Takashi Iwai wrote:
> >>
> >> At Tue, 4 Dec 2012 20:58:55 +0800,
> >> Daniel J Blueman wrote:
> >> >
> >> > On 4 December 2012 01:10, Takashi Iwai  wrote:
> >> > > At Tue, 4 Dec 2012 00:50:56 +0800,
> >> > > Daniel J Blueman wrote:
> >> > >>
> >> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
> >> > >> > Daniel J Blueman wrote:
> >> > >> >>
> >> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> >> > >> >> > Daniel J Blueman wrote:
> >> > >> >> >>
> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> > >> >> >> > Takashi Iwai wrote:
> >> > >> >> >> >>
> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> > >> >> >> >> Daniel J Blueman wrote:
> >> > >> >> >> >> >
> >> > >> >> >> >> > Hi Seth, Dave, Takashi,
> >> > >> >> >> >> >
> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm 
> >> > >> >> >> >> > starts by
> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I 
> >> > >> >> >> >> > see a race
> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing 
> >> > >> >> >> >> > out to
> >> > >> >> >> >> > commands [2].
> >> > >> >> >> >> >
> >> > >> >> >> >> > Adding some debug, I see that the registered audio devices 
> >> > >> >> >> >> > are put
> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the 
> >> > >> >> >> >> > discrete (and
> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit 
> >> > >> >> >> >> > later, so the
> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
> >> > >> >> >> >> > talking to
> >> > >> >> >> >> > hardware which is now in D3.
> >> > >> >> >> >> >
> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for 
> >> > >> >> >> >> > the DGPU HDA
> >> > >> >> >> >> > controller, but perhaps this should be solved at a higher 
> >> > >> >> >> >> > level in the
> >> > >> >> >> >> > vgaswitcheroo code; what do you think?
> >> > >> >> >> >>
> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix 
> >> > >> >> >> >> another race in
> >> > >> >> >> >> the probe.  A part of them problem is that the registration 
> >> > >> >> >> >> is done at
> >> > >> >> >> >> the very last of probing.
> >> > >> >> >> >>
> >> > >> >> >> >> Instead of delaying the registration, how about the patch 
> >> > >> >> >> >> below?
> >> > >> >> >> >
> >> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 
> >> > >> >> >> > merge, at
> >> > >> >> >> > least...
> >> > >> >> >>
> >> > >> >> >> Ping ack; I was trying to find time to understand another race 
> >> > >> >> >> that
> >> > >> >> >> occurs with GPU probing after switching, but is separate from 
> >> > >> >> >> the
> >> > >> >> >> situation before switching, here.
> >> > >> >> >>
> >> > >> >> >> In the context of writing the switch, it looks like struct azx 
> >> > >> >> >> isn't
> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; 
> >> > >> >> >> racing with
> >> > >> >> >> azx_codec_create?
> >> > >> >> >
> >> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >> > >> >> >
> >> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
> >> > >> >> > register_vga_switcheroo().  Could you retest with it?
> >> > >> >>
> >> > >> >> Superb; this addresses the oops.
> >> > >> >
> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> >> > >> >
> >> > >> >> ~1 second after the DGPU is put into D3, I still often see 
> >> > >> >> "hda-intel:
> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt
> >> > >> >
> >> > >> > Hm, it's not clear who triggers these messages.  I'll try to check 
> >> > >> > the
> >> > >> > code paths.
> >> > >> >
> >> > >> >> Presumably this implies the read of the ring-buffer pointer 
> >> > >> >> returned
> >> > >> >> 0x, so the HDA driver understands the pointer to have 
> >> > >> >> wrapped
> >> > >> >> and processes the 191 unwritten entries?
> >> > >> >
> >> > >> > Good point.  Actually there is one bug that looks obviously wrong
> >> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
> >> > >> > because writing CORBRP doesn't influence except for the reset bit.
> >> > >> >
> >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word 
> >> > >> > so
> >> > >> > that we can check 0x as invalid.
> >> > >> >
> >> > >> > A test patch is below.  Hopefully this improves the situation...
> >> > >>
> >> > >> I'll check this out tomorrow and also instrument the code to get a
> >> > >> backtrace, since there may still be an underlying 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 21:55, Takashi Iwai  wrote:
> At Tue, 04 Dec 2012 14:23:05 +0100,
> Takashi Iwai wrote:
>>
>> At Tue, 4 Dec 2012 20:58:55 +0800,
>> Daniel J Blueman wrote:
>> >
>> > On 4 December 2012 01:10, Takashi Iwai  wrote:
>> > > At Tue, 4 Dec 2012 00:50:56 +0800,
>> > > Daniel J Blueman wrote:
>> > >>
>> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
>> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
>> > >> > Daniel J Blueman wrote:
>> > >> >>
>> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
>> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
>> > >> >> > Daniel J Blueman wrote:
>> > >> >> >>
>> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
>> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> > >> >> >> > Takashi Iwai wrote:
>> > >> >> >> >>
>> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> > >> >> >> >> Daniel J Blueman wrote:
>> > >> >> >> >> >
>> > >> >> >> >> > Hi Seth, Dave, Takashi,
>> > >> >> >> >> >
>> > >> >> >> >> > If I power down the unused discrete GPU before lightdm 
>> > >> >> >> >> > starts by
>> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I 
>> > >> >> >> >> > see a race
>> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out 
>> > >> >> >> >> > to
>> > >> >> >> >> > commands [2].
>> > >> >> >> >> >
>> > >> >> >> >> > Adding some debug, I see that the registered audio devices 
>> > >> >> >> >> > are put
>> > >> >> >> >> > into D3 before the GPU is, but it turns out that the 
>> > >> >> >> >> > discrete (and
>> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, 
>> > >> >> >> >> > so the
>> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
>> > >> >> >> >> > talking to
>> > >> >> >> >> > hardware which is now in D3.
>> > >> >> >> >> >
>> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the 
>> > >> >> >> >> > DGPU HDA
>> > >> >> >> >> > controller, but perhaps this should be solved at a higher 
>> > >> >> >> >> > level in the
>> > >> >> >> >> > vgaswitcheroo code; what do you think?
>> > >> >> >> >>
>> > >> >> >> >> Maybe it's a side effect for the recent effort to fix another 
>> > >> >> >> >> race in
>> > >> >> >> >> the probe.  A part of them problem is that the registration is 
>> > >> >> >> >> done at
>> > >> >> >> >> the very last of probing.
>> > >> >> >> >>
>> > >> >> >> >> Instead of delaying the registration, how about the patch 
>> > >> >> >> >> below?
>> > >> >> >> >
>> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 
>> > >> >> >> > merge, at
>> > >> >> >> > least...
>> > >> >> >>
>> > >> >> >> Ping ack; I was trying to find time to understand another race 
>> > >> >> >> that
>> > >> >> >> occurs with GPU probing after switching, but is separate from the
>> > >> >> >> situation before switching, here.
>> > >> >> >>
>> > >> >> >> In the context of writing the switch, it looks like struct azx 
>> > >> >> >> isn't
>> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing 
>> > >> >> >> with
>> > >> >> >> azx_codec_create?
>> > >> >> >
>> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
>> > >> >> >
>> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
>> > >> >> > register_vga_switcheroo().  Could you retest with it?
>> > >> >>
>> > >> >> Superb; this addresses the oops.
>> > >> >
>> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
>> > >> >
>> > >> >> ~1 second after the DGPU is put into D3, I still often see 
>> > >> >> "hda-intel:
>> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
>> > >> >> http://quora.org/2012/hda-switch-spurious.txt
>> > >> >
>> > >> > Hm, it's not clear who triggers these messages.  I'll try to check the
>> > >> > code paths.
>> > >> >
>> > >> >> Presumably this implies the read of the ring-buffer pointer returned
>> > >> >> 0x, so the HDA driver understands the pointer to have wrapped
>> > >> >> and processes the 191 unwritten entries?
>> > >> >
>> > >> > Good point.  Actually there is one bug that looks obviously wrong
>> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
>> > >> > because writing CORBRP doesn't influence except for the reset bit.
>> > >> >
>> > >> > Reading CORBWP as a byte is OK, but this could be better in a word so
>> > >> > that we can check 0x as invalid.
>> > >> >
>> > >> > A test patch is below.  Hopefully this improves the situation...
>> > >>
>> > >> I'll check this out tomorrow and also instrument the code to get a
>> > >> backtrace, since there may still be an underlying race with the
>> > >> previous patches:
>> >
>> > [...]
>> >
>> > > That's odd.  The Cirrus one (:00:1b.0) must be independent from
>> > > the vga switcheroo things for Nvidia...
>> > >
>> > > The patch below is the revised patch of the first one.  Now the vga
>> > > switcheroo registration is done before the video controller D3 check,
>> > > 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 04 Dec 2012 14:23:05 +0100,
Takashi Iwai wrote:
> 
> At Tue, 4 Dec 2012 20:58:55 +0800,
> Daniel J Blueman wrote:
> > 
> > On 4 December 2012 01:10, Takashi Iwai  wrote:
> > > At Tue, 4 Dec 2012 00:50:56 +0800,
> > > Daniel J Blueman wrote:
> > >>
> > >> On 4 December 2012 00:23, Takashi Iwai  wrote:
> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
> > >> > Daniel J Blueman wrote:
> > >> >>
> > >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> > >> >> > Daniel J Blueman wrote:
> > >> >> >>
> > >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> > >> >> >> > Takashi Iwai wrote:
> > >> >> >> >>
> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> > >> >> >> >> Daniel J Blueman wrote:
> > >> >> >> >> >
> > >> >> >> >> > Hi Seth, Dave, Takashi,
> > >> >> >> >> >
> > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts 
> > >> >> >> >> > by
> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see 
> > >> >> >> >> > a race
> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> > >> >> >> >> > commands [2].
> > >> >> >> >> >
> > >> >> >> >> > Adding some debug, I see that the registered audio devices 
> > >> >> >> >> > are put
> > >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete 
> > >> >> >> >> > (and
> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, 
> > >> >> >> >> > so the
> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
> > >> >> >> >> > talking to
> > >> >> >> >> > hardware which is now in D3.
> > >> >> >> >> >
> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the 
> > >> >> >> >> > DGPU HDA
> > >> >> >> >> > controller, but perhaps this should be solved at a higher 
> > >> >> >> >> > level in the
> > >> >> >> >> > vgaswitcheroo code; what do you think?
> > >> >> >> >>
> > >> >> >> >> Maybe it's a side effect for the recent effort to fix another 
> > >> >> >> >> race in
> > >> >> >> >> the probe.  A part of them problem is that the registration is 
> > >> >> >> >> done at
> > >> >> >> >> the very last of probing.
> > >> >> >> >>
> > >> >> >> >> Instead of delaying the registration, how about the patch below?
> > >> >> >> >
> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, 
> > >> >> >> > at
> > >> >> >> > least...
> > >> >> >>
> > >> >> >> Ping ack; I was trying to find time to understand another race that
> > >> >> >> occurs with GPU probing after switching, but is separate from the
> > >> >> >> situation before switching, here.
> > >> >> >>
> > >> >> >> In the context of writing the switch, it looks like struct azx 
> > >> >> >> isn't
> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing 
> > >> >> >> with
> > >> >> >> azx_codec_create?
> > >> >> >
> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> > >> >> >
> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
> > >> >> > register_vga_switcheroo().  Could you retest with it?
> > >> >>
> > >> >> Superb; this addresses the oops.
> > >> >
> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> > >> >
> > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
> > >> >> http://quora.org/2012/hda-switch-spurious.txt
> > >> >
> > >> > Hm, it's not clear who triggers these messages.  I'll try to check the
> > >> > code paths.
> > >> >
> > >> >> Presumably this implies the read of the ring-buffer pointer returned
> > >> >> 0x, so the HDA driver understands the pointer to have wrapped
> > >> >> and processes the 191 unwritten entries?
> > >> >
> > >> > Good point.  Actually there is one bug that looks obviously wrong
> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
> > >> > because writing CORBRP doesn't influence except for the reset bit.
> > >> >
> > >> > Reading CORBWP as a byte is OK, but this could be better in a word so
> > >> > that we can check 0x as invalid.
> > >> >
> > >> > A test patch is below.  Hopefully this improves the situation...
> > >>
> > >> I'll check this out tomorrow and also instrument the code to get a
> > >> backtrace, since there may still be an underlying race with the
> > >> previous patches:
> > 
> > [...]
> > 
> > > That's odd.  The Cirrus one (:00:1b.0) must be independent from
> > > the vga switcheroo things for Nvidia...
> > >
> > > The patch below is the revised patch of the first one.  Now the vga
> > > switcheroo registration is done before the video controller D3 check,
> > > so the race should be smaller.
> > 
> > This patch improves things further of course; a major improvement over
> > without. ~15% of the time, I still get the 'spurious response'
> > messages in this callpath:
> 
> A possible 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 20:58:55 +0800,
Daniel J Blueman wrote:
> 
> On 4 December 2012 01:10, Takashi Iwai  wrote:
> > At Tue, 4 Dec 2012 00:50:56 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 4 December 2012 00:23, Takashi Iwai  wrote:
> >> > At Mon, 3 Dec 2012 23:08:28 +0800,
> >> > Daniel J Blueman wrote:
> >> >>
> >> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
> >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> >> >> > Daniel J Blueman wrote:
> >> >> >>
> >> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> >> >> > Takashi Iwai wrote:
> >> >> >> >>
> >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> >> >> Daniel J Blueman wrote:
> >> >> >> >> >
> >> >> >> >> > Hi Seth, Dave, Takashi,
> >> >> >> >> >
> >> >> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a 
> >> >> >> >> > race
> >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> >> >> > commands [2].
> >> >> >> >> >
> >> >> >> >> > Adding some debug, I see that the registered audio devices are 
> >> >> >> >> > put
> >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete 
> >> >> >> >> > (and
> >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so 
> >> >> >> >> > the
> >> >> >> >> > list is empty. The symptom is since the HDA driver it's talking 
> >> >> >> >> > to
> >> >> >> >> > hardware which is now in D3.
> >> >> >> >> >
> >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the 
> >> >> >> >> > DGPU HDA
> >> >> >> >> > controller, but perhaps this should be solved at a higher level 
> >> >> >> >> > in the
> >> >> >> >> > vgaswitcheroo code; what do you think?
> >> >> >> >>
> >> >> >> >> Maybe it's a side effect for the recent effort to fix another 
> >> >> >> >> race in
> >> >> >> >> the probe.  A part of them problem is that the registration is 
> >> >> >> >> done at
> >> >> >> >> the very last of probing.
> >> >> >> >>
> >> >> >> >> Instead of delaying the registration, how about the patch below?
> >> >> >> >
> >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> >> >> >> > least...
> >> >> >>
> >> >> >> Ping ack; I was trying to find time to understand another race that
> >> >> >> occurs with GPU probing after switching, but is separate from the
> >> >> >> situation before switching, here.
> >> >> >>
> >> >> >> In the context of writing the switch, it looks like struct azx isn't
> >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> >> >> azx_codec_create?
> >> >> >
> >> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >> >> >
> >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
> >> >> > register_vga_switcheroo().  Could you retest with it?
> >> >>
> >> >> Superb; this addresses the oops.
> >> >
> >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> >> >
> >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> >> >> spurious response 0x0:0x0, last cmd=0x170500":
> >> >> http://quora.org/2012/hda-switch-spurious.txt
> >> >
> >> > Hm, it's not clear who triggers these messages.  I'll try to check the
> >> > code paths.
> >> >
> >> >> Presumably this implies the read of the ring-buffer pointer returned
> >> >> 0x, so the HDA driver understands the pointer to have wrapped
> >> >> and processes the 191 unwritten entries?
> >> >
> >> > Good point.  Actually there is one bug that looks obviously wrong
> >> > (writing 32bit value to CORBWP).  Maybe it has been working just
> >> > because writing CORBRP doesn't influence except for the reset bit.
> >> >
> >> > Reading CORBWP as a byte is OK, but this could be better in a word so
> >> > that we can check 0x as invalid.
> >> >
> >> > A test patch is below.  Hopefully this improves the situation...
> >>
> >> I'll check this out tomorrow and also instrument the code to get a
> >> backtrace, since there may still be an underlying race with the
> >> previous patches:
> 
> [...]
> 
> > That's odd.  The Cirrus one (:00:1b.0) must be independent from
> > the vga switcheroo things for Nvidia...
> >
> > The patch below is the revised patch of the first one.  Now the vga
> > switcheroo registration is done before the video controller D3 check,
> > so the race should be smaller.
> 
> This patch improves things further of course; a major improvement over
> without. ~15% of the time, I still get the 'spurious response'
> messages in this callpath:

A possible scenario is that the graphics went in D3 before probing
hd-audio, and the hd-audio continues to initialize the hardware
without knowing the graphics counterpart is disabled.

There is a code check_hdmi_disabled() in hda_intel.c that checks the 
availability of the video driver, and it might be that this doesn't
work as expected...

> 
> Pid: 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 01:10, Takashi Iwai  wrote:
> At Tue, 4 Dec 2012 00:50:56 +0800,
> Daniel J Blueman wrote:
>>
>> On 4 December 2012 00:23, Takashi Iwai  wrote:
>> > At Mon, 3 Dec 2012 23:08:28 +0800,
>> > Daniel J Blueman wrote:
>> >>
>> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
>> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
>> >> > Daniel J Blueman wrote:
>> >> >>
>> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
>> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> >> >> > Takashi Iwai wrote:
>> >> >> >>
>> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> >> >> Daniel J Blueman wrote:
>> >> >> >> >
>> >> >> >> > Hi Seth, Dave, Takashi,
>> >> >> >> >
>> >> >> >> > If I power down the unused discrete GPU before lightdm starts by
>> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a 
>> >> >> >> > race
>> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
>> >> >> >> > commands [2].
>> >> >> >> >
>> >> >> >> > Adding some debug, I see that the registered audio devices are put
>> >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
>> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
>> >> >> >> > list is empty. The symptom is since the HDA driver it's talking to
>> >> >> >> > hardware which is now in D3.
>> >> >> >> >
>> >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU 
>> >> >> >> > HDA
>> >> >> >> > controller, but perhaps this should be solved at a higher level 
>> >> >> >> > in the
>> >> >> >> > vgaswitcheroo code; what do you think?
>> >> >> >>
>> >> >> >> Maybe it's a side effect for the recent effort to fix another race 
>> >> >> >> in
>> >> >> >> the probe.  A part of them problem is that the registration is done 
>> >> >> >> at
>> >> >> >> the very last of probing.
>> >> >> >>
>> >> >> >> Instead of delaying the registration, how about the patch below?
>> >> >> >
>> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
>> >> >> > least...
>> >> >>
>> >> >> Ping ack; I was trying to find time to understand another race that
>> >> >> occurs with GPU probing after switching, but is separate from the
>> >> >> situation before switching, here.
>> >> >>
>> >> >> In the context of writing the switch, it looks like struct azx isn't
>> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
>> >> >> azx_codec_create?
>> >> >
>> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
>> >> >
>> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
>> >> > register_vga_switcheroo().  Could you retest with it?
>> >>
>> >> Superb; this addresses the oops.
>> >
>> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
>> >
>> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
>> >> spurious response 0x0:0x0, last cmd=0x170500":
>> >> http://quora.org/2012/hda-switch-spurious.txt
>> >
>> > Hm, it's not clear who triggers these messages.  I'll try to check the
>> > code paths.
>> >
>> >> Presumably this implies the read of the ring-buffer pointer returned
>> >> 0x, so the HDA driver understands the pointer to have wrapped
>> >> and processes the 191 unwritten entries?
>> >
>> > Good point.  Actually there is one bug that looks obviously wrong
>> > (writing 32bit value to CORBWP).  Maybe it has been working just
>> > because writing CORBRP doesn't influence except for the reset bit.
>> >
>> > Reading CORBWP as a byte is OK, but this could be better in a word so
>> > that we can check 0x as invalid.
>> >
>> > A test patch is below.  Hopefully this improves the situation...
>>
>> I'll check this out tomorrow and also instrument the code to get a
>> backtrace, since there may still be an underlying race with the
>> previous patches:

[...]

> That's odd.  The Cirrus one (:00:1b.0) must be independent from
> the vga switcheroo things for Nvidia...
>
> The patch below is the revised patch of the first one.  Now the vga
> switcheroo registration is done before the video controller D3 check,
> so the race should be smaller.

This patch improves things further of course; a major improvement over
without. ~15% of the time, I still get the 'spurious response'
messages in this callpath:

Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15
Call Trace:
 [] azx_update_rirb+0xe6/0xf0 [snd_hda_intel]
 [] azx_get_response+0xc0/0x270 [snd_hda_intel]
 [] codec_exec_verb+0xc1/0x100 [snd_hda_codec]
 [] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec]
 [] ? try_to_grab_pending+0xb9/0x130
 [] hda_set_power_state+0xd5/0x110 [snd_hda_codec]
 [] hda_call_codec_resume+0x20/0x120 [snd_hda_codec]
 [] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec]
 [] codec_exec_verb+0x4f/0x100 [snd_hda_codec]
 [] snd_hda_codec_write+0x6b/0xa0 [snd_hda_codec]
 [] snd_hda_codec_write_cache+0x31/0xa0 [snd_hda_codec]
 [] snd_hda_jack_detect_enable_callback+0x75/0xa0
[snd_hda_codec]
 [] 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
 At Tue, 4 Dec 2012 00:50:56 +0800,
 Daniel J Blueman wrote:

 On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 23:08:28 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
   At Mon, 3 Dec 2012 22:25:52 +0800,
   Daniel J Blueman wrote:
  
   On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
   
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:

 Hi Seth, Dave, Takashi,

 If I power down the unused discrete GPU before lightdm starts by
 fiddling with the sysfs file [1] in the upstart script, I see a 
 race
 manifesting as the discrete GPU's HDA controller timing out to
 commands [2].

 Adding some debug, I see that the registered audio devices are put
 into D3 before the GPU is, but it turns out that the discrete (and
 internal) GPU's HDA controller gets registered a bit later, so the
 list is empty. The symptom is since the HDA driver it's talking to
 hardware which is now in D3.

 We could add a mutex to nouveau to allow us to wait for the DGPU 
 HDA
 controller, but perhaps this should be solved at a higher level 
 in the
 vgaswitcheroo code; what do you think?
   
Maybe it's a side effect for the recent effort to fix another race 
in
the probe.  A part of them problem is that the registration is done 
at
the very last of probing.
   
Instead of delaying the registration, how about the patch below?
   
Ping.  If this really works, I'd like to queue it for 3.8 merge, at
least...
  
   Ping ack; I was trying to find time to understand another race that
   occurs with GPU probing after switching, but is separate from the
   situation before switching, here.
  
   In the context of writing the switch, it looks like struct azx isn't
   allocated by the time azx_vs_set_state accesses it [1,2]; racing with
   azx_codec_create?
  
   It was allocated, but it wasn't assigned properly in pci drvdata.
  
   Below is the revised patch.  Just moved pci_set_drvdata() before
   register_vga_switcheroo().  Could you retest with it?
 
  Superb; this addresses the oops.
 
  OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
 
  ~1 second after the DGPU is put into D3, I still often see hda-intel:
  spurious response 0x0:0x0, last cmd=0x170500:
  http://quora.org/2012/hda-switch-spurious.txt
 
  Hm, it's not clear who triggers these messages.  I'll try to check the
  code paths.
 
  Presumably this implies the read of the ring-buffer pointer returned
  0x, so the HDA driver understands the pointer to have wrapped
  and processes the 191 unwritten entries?
 
  Good point.  Actually there is one bug that looks obviously wrong
  (writing 32bit value to CORBWP).  Maybe it has been working just
  because writing CORBRP doesn't influence except for the reset bit.
 
  Reading CORBWP as a byte is OK, but this could be better in a word so
  that we can check 0x as invalid.
 
  A test patch is below.  Hopefully this improves the situation...

 I'll check this out tomorrow and also instrument the code to get a
 backtrace, since there may still be an underlying race with the
 previous patches:

[...]

 That's odd.  The Cirrus one (:00:1b.0) must be independent from
 the vga switcheroo things for Nvidia...

 The patch below is the revised patch of the first one.  Now the vga
 switcheroo registration is done before the video controller D3 check,
 so the race should be smaller.

This patch improves things further of course; a major improvement over
without. ~15% of the time, I still get the 'spurious response'
messages in this callpath:

Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15
Call Trace:
 [a0025776] azx_update_rirb+0xe6/0xf0 [snd_hda_intel]
 [a0026d00] azx_get_response+0xc0/0x270 [snd_hda_intel]
 [a0226801] codec_exec_verb+0xc1/0x100 [snd_hda_codec]
 [a022799b] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec]
 [810545e9] ? try_to_grab_pending+0xb9/0x130
 [a0227d35] hda_set_power_state+0xd5/0x110 [snd_hda_codec]
 [a0226420] hda_call_codec_resume+0x20/0x120 [snd_hda_codec]
 [a0226689] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec]
 [a022678f] codec_exec_verb+0x4f/0x100 [snd_hda_codec]
 [a02268ab] snd_hda_codec_write+0x6b/0xa0 [snd_hda_codec]
 [a0227281] snd_hda_codec_write_cache+0x31/0xa0 [snd_hda_codec]
 [a022b965] snd_hda_jack_detect_enable_callback+0x75/0xa0
[snd_hda_codec]
 [a022b9a1] snd_hda_jack_detect_enable+0x11/0x20 [snd_hda_codec]
 [a00cf5a6] generic_hdmi_init+0xb6/0xe0 [snd_hda_codec_hdmi]
 [a022ad67] snd_hda_codec_build_controls+0x37/0x130 [snd_hda_codec]
 [a022b080] ? snd_hda_codec_build_pcms+0x1a0/0x2e0 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 20:58:55 +0800,
Daniel J Blueman wrote:
 
 On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
  At Tue, 4 Dec 2012 00:50:56 +0800,
  Daniel J Blueman wrote:
 
  On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
   At Mon, 3 Dec 2012 23:08:28 +0800,
   Daniel J Blueman wrote:
  
   On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
At Mon, 3 Dec 2012 22:25:52 +0800,
Daniel J Blueman wrote:
   
On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
 At Wed, 28 Nov 2012 09:45:39 +0100,
 Takashi Iwai wrote:

 At Wed, 28 Nov 2012 11:45:07 +0800,
 Daniel J Blueman wrote:
 
  Hi Seth, Dave, Takashi,
 
  If I power down the unused discrete GPU before lightdm starts by
  fiddling with the sysfs file [1] in the upstart script, I see a 
  race
  manifesting as the discrete GPU's HDA controller timing out to
  commands [2].
 
  Adding some debug, I see that the registered audio devices are 
  put
  into D3 before the GPU is, but it turns out that the discrete 
  (and
  internal) GPU's HDA controller gets registered a bit later, so 
  the
  list is empty. The symptom is since the HDA driver it's talking 
  to
  hardware which is now in D3.
 
  We could add a mutex to nouveau to allow us to wait for the 
  DGPU HDA
  controller, but perhaps this should be solved at a higher level 
  in the
  vgaswitcheroo code; what do you think?

 Maybe it's a side effect for the recent effort to fix another 
 race in
 the probe.  A part of them problem is that the registration is 
 done at
 the very last of probing.

 Instead of delaying the registration, how about the patch below?

 Ping.  If this really works, I'd like to queue it for 3.8 merge, at
 least...
   
Ping ack; I was trying to find time to understand another race that
occurs with GPU probing after switching, but is separate from the
situation before switching, here.
   
In the context of writing the switch, it looks like struct azx isn't
allocated by the time azx_vs_set_state accesses it [1,2]; racing with
azx_codec_create?
   
It was allocated, but it wasn't assigned properly in pci drvdata.
   
Below is the revised patch.  Just moved pci_set_drvdata() before
register_vga_switcheroo().  Could you retest with it?
  
   Superb; this addresses the oops.
  
   OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
  
   ~1 second after the DGPU is put into D3, I still often see hda-intel:
   spurious response 0x0:0x0, last cmd=0x170500:
   http://quora.org/2012/hda-switch-spurious.txt
  
   Hm, it's not clear who triggers these messages.  I'll try to check the
   code paths.
  
   Presumably this implies the read of the ring-buffer pointer returned
   0x, so the HDA driver understands the pointer to have wrapped
   and processes the 191 unwritten entries?
  
   Good point.  Actually there is one bug that looks obviously wrong
   (writing 32bit value to CORBWP).  Maybe it has been working just
   because writing CORBRP doesn't influence except for the reset bit.
  
   Reading CORBWP as a byte is OK, but this could be better in a word so
   that we can check 0x as invalid.
  
   A test patch is below.  Hopefully this improves the situation...
 
  I'll check this out tomorrow and also instrument the code to get a
  backtrace, since there may still be an underlying race with the
  previous patches:
 
 [...]
 
  That's odd.  The Cirrus one (:00:1b.0) must be independent from
  the vga switcheroo things for Nvidia...
 
  The patch below is the revised patch of the first one.  Now the vga
  switcheroo registration is done before the video controller D3 check,
  so the race should be smaller.
 
 This patch improves things further of course; a major improvement over
 without. ~15% of the time, I still get the 'spurious response'
 messages in this callpath:

A possible scenario is that the graphics went in D3 before probing
hd-audio, and the hd-audio continues to initialize the hardware
without knowing the graphics counterpart is disabled.

There is a code check_hdmi_disabled() in hda_intel.c that checks the 
availability of the video driver, and it might be that this doesn't
work as expected...

 
 Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15
 Call Trace:
  [a0025776] azx_update_rirb+0xe6/0xf0 [snd_hda_intel]
  [a0026d00] azx_get_response+0xc0/0x270 [snd_hda_intel]
  [a0226801] codec_exec_verb+0xc1/0x100 [snd_hda_codec]
  [a022799b] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec]
  [810545e9] ? try_to_grab_pending+0xb9/0x130
  [a0227d35] hda_set_power_state+0xd5/0x110 [snd_hda_codec]
  [a0226420] hda_call_codec_resume+0x20/0x120 [snd_hda_codec]
  [a0226689] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec]
  [a022678f] 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 04 Dec 2012 14:23:05 +0100,
Takashi Iwai wrote:
 
 At Tue, 4 Dec 2012 20:58:55 +0800,
 Daniel J Blueman wrote:
  
  On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
   At Tue, 4 Dec 2012 00:50:56 +0800,
   Daniel J Blueman wrote:
  
   On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
   
On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 22:25:52 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
  At Wed, 28 Nov 2012 09:45:39 +0100,
  Takashi Iwai wrote:
 
  At Wed, 28 Nov 2012 11:45:07 +0800,
  Daniel J Blueman wrote:
  
   Hi Seth, Dave, Takashi,
  
   If I power down the unused discrete GPU before lightdm starts 
   by
   fiddling with the sysfs file [1] in the upstart script, I see 
   a race
   manifesting as the discrete GPU's HDA controller timing out to
   commands [2].
  
   Adding some debug, I see that the registered audio devices 
   are put
   into D3 before the GPU is, but it turns out that the discrete 
   (and
   internal) GPU's HDA controller gets registered a bit later, 
   so the
   list is empty. The symptom is since the HDA driver it's 
   talking to
   hardware which is now in D3.
  
   We could add a mutex to nouveau to allow us to wait for the 
   DGPU HDA
   controller, but perhaps this should be solved at a higher 
   level in the
   vgaswitcheroo code; what do you think?
 
  Maybe it's a side effect for the recent effort to fix another 
  race in
  the probe.  A part of them problem is that the registration is 
  done at
  the very last of probing.
 
  Instead of delaying the registration, how about the patch below?
 
  Ping.  If this really works, I'd like to queue it for 3.8 merge, 
  at
  least...

 Ping ack; I was trying to find time to understand another race that
 occurs with GPU probing after switching, but is separate from the
 situation before switching, here.

 In the context of writing the switch, it looks like struct azx 
 isn't
 allocated by the time azx_vs_set_state accesses it [1,2]; racing 
 with
 azx_codec_create?

 It was allocated, but it wasn't assigned properly in pci drvdata.

 Below is the revised patch.  Just moved pci_set_drvdata() before
 register_vga_switcheroo().  Could you retest with it?
   
Superb; this addresses the oops.
   
OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
   
~1 second after the DGPU is put into D3, I still often see hda-intel:
spurious response 0x0:0x0, last cmd=0x170500:
http://quora.org/2012/hda-switch-spurious.txt
   
Hm, it's not clear who triggers these messages.  I'll try to check the
code paths.
   
Presumably this implies the read of the ring-buffer pointer returned
0x, so the HDA driver understands the pointer to have wrapped
and processes the 191 unwritten entries?
   
Good point.  Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP).  Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.
   
Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0x as invalid.
   
A test patch is below.  Hopefully this improves the situation...
  
   I'll check this out tomorrow and also instrument the code to get a
   backtrace, since there may still be an underlying race with the
   previous patches:
  
  [...]
  
   That's odd.  The Cirrus one (:00:1b.0) must be independent from
   the vga switcheroo things for Nvidia...
  
   The patch below is the revised patch of the first one.  Now the vga
   switcheroo registration is done before the video controller D3 check,
   so the race should be smaller.
  
  This patch improves things further of course; a major improvement over
  without. ~15% of the time, I still get the 'spurious response'
  messages in this callpath:
 
 A possible scenario is that the graphics went in D3 before probing
 hd-audio, and the hd-audio continues to initialize the hardware
 without knowing the graphics counterpart is disabled.
 
 There is a code check_hdmi_disabled() in hda_intel.c that checks the 
 availability of the video driver, and it might be that this doesn't
 work as expected...

I think I understand this path.  You are setting OFF, right?
This will set the audio client OFF before can_switch() is called.
Thus it can be called even before the probing process finished.

In short, wait_for_completion() must be put at the beginning of
azx_vs_set_state() in addition to azx_vs_can_switch().

The revised patch is attached below.  Hopefully this sorts out all
races.


Takashi

---
diff --git 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote:
 At Tue, 04 Dec 2012 14:23:05 +0100,
 Takashi Iwai wrote:

 At Tue, 4 Dec 2012 20:58:55 +0800,
 Daniel J Blueman wrote:
 
  On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
   At Tue, 4 Dec 2012 00:50:56 +0800,
   Daniel J Blueman wrote:
  
   On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
   
On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 22:25:52 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
  At Wed, 28 Nov 2012 09:45:39 +0100,
  Takashi Iwai wrote:
 
  At Wed, 28 Nov 2012 11:45:07 +0800,
  Daniel J Blueman wrote:
  
   Hi Seth, Dave, Takashi,
  
   If I power down the unused discrete GPU before lightdm 
   starts by
   fiddling with the sysfs file [1] in the upstart script, I 
   see a race
   manifesting as the discrete GPU's HDA controller timing out 
   to
   commands [2].
  
   Adding some debug, I see that the registered audio devices 
   are put
   into D3 before the GPU is, but it turns out that the 
   discrete (and
   internal) GPU's HDA controller gets registered a bit later, 
   so the
   list is empty. The symptom is since the HDA driver it's 
   talking to
   hardware which is now in D3.
  
   We could add a mutex to nouveau to allow us to wait for the 
   DGPU HDA
   controller, but perhaps this should be solved at a higher 
   level in the
   vgaswitcheroo code; what do you think?
 
  Maybe it's a side effect for the recent effort to fix another 
  race in
  the probe.  A part of them problem is that the registration is 
  done at
  the very last of probing.
 
  Instead of delaying the registration, how about the patch 
  below?
 
  Ping.  If this really works, I'd like to queue it for 3.8 
  merge, at
  least...

 Ping ack; I was trying to find time to understand another race 
 that
 occurs with GPU probing after switching, but is separate from the
 situation before switching, here.

 In the context of writing the switch, it looks like struct azx 
 isn't
 allocated by the time azx_vs_set_state accesses it [1,2]; racing 
 with
 azx_codec_create?

 It was allocated, but it wasn't assigned properly in pci drvdata.

 Below is the revised patch.  Just moved pci_set_drvdata() before
 register_vga_switcheroo().  Could you retest with it?
   
Superb; this addresses the oops.
   
OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
   
~1 second after the DGPU is put into D3, I still often see 
hda-intel:
spurious response 0x0:0x0, last cmd=0x170500:
http://quora.org/2012/hda-switch-spurious.txt
   
Hm, it's not clear who triggers these messages.  I'll try to check the
code paths.
   
Presumably this implies the read of the ring-buffer pointer returned
0x, so the HDA driver understands the pointer to have wrapped
and processes the 191 unwritten entries?
   
Good point.  Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP).  Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.
   
Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0x as invalid.
   
A test patch is below.  Hopefully this improves the situation...
  
   I'll check this out tomorrow and also instrument the code to get a
   backtrace, since there may still be an underlying race with the
   previous patches:
 
  [...]
 
   That's odd.  The Cirrus one (:00:1b.0) must be independent from
   the vga switcheroo things for Nvidia...
  
   The patch below is the revised patch of the first one.  Now the vga
   switcheroo registration is done before the video controller D3 check,
   so the race should be smaller.
 
  This patch improves things further of course; a major improvement over
  without. ~15% of the time, I still get the 'spurious response'
  messages in this callpath:

 A possible scenario is that the graphics went in D3 before probing
 hd-audio, and the hd-audio continues to initialize the hardware
 without knowing the graphics counterpart is disabled.

 There is a code check_hdmi_disabled() in hda_intel.c that checks the
 availability of the video driver, and it might be that this doesn't
 work as expected...

 I think I understand this path.  You are setting OFF, right?
 This will set the audio client OFF before can_switch() is called.
 Thus it can be called even before the probing process finished.

 In short, wait_for_completion() must be put at the beginning of
 azx_vs_set_state() in addition to azx_vs_can_switch().

 The revised 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 22:46:47 +0800,
Daniel J Blueman wrote:
 
 On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote:
  At Tue, 04 Dec 2012 14:23:05 +0100,
  Takashi Iwai wrote:
 
  At Tue, 4 Dec 2012 20:58:55 +0800,
  Daniel J Blueman wrote:
  
   On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
At Tue, 4 Dec 2012 00:50:56 +0800,
Daniel J Blueman wrote:
   
On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 23:08:28 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 22:25:52 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
   At Wed, 28 Nov 2012 09:45:39 +0100,
   Takashi Iwai wrote:
  
   At Wed, 28 Nov 2012 11:45:07 +0800,
   Daniel J Blueman wrote:
   
Hi Seth, Dave, Takashi,
   
If I power down the unused discrete GPU before lightdm 
starts by
fiddling with the sysfs file [1] in the upstart script, I 
see a race
manifesting as the discrete GPU's HDA controller timing 
out to
commands [2].
   
Adding some debug, I see that the registered audio devices 
are put
into D3 before the GPU is, but it turns out that the 
discrete (and
internal) GPU's HDA controller gets registered a bit 
later, so the
list is empty. The symptom is since the HDA driver it's 
talking to
hardware which is now in D3.
   
We could add a mutex to nouveau to allow us to wait for 
the DGPU HDA
controller, but perhaps this should be solved at a higher 
level in the
vgaswitcheroo code; what do you think?
  
   Maybe it's a side effect for the recent effort to fix 
   another race in
   the probe.  A part of them problem is that the registration 
   is done at
   the very last of probing.
  
   Instead of delaying the registration, how about the patch 
   below?
  
   Ping.  If this really works, I'd like to queue it for 3.8 
   merge, at
   least...
 
  Ping ack; I was trying to find time to understand another race 
  that
  occurs with GPU probing after switching, but is separate from 
  the
  situation before switching, here.
 
  In the context of writing the switch, it looks like struct azx 
  isn't
  allocated by the time azx_vs_set_state accesses it [1,2]; 
  racing with
  azx_codec_create?
 
  It was allocated, but it wasn't assigned properly in pci drvdata.
 
  Below is the revised patch.  Just moved pci_set_drvdata() before
  register_vga_switcheroo().  Could you retest with it?

 Superb; this addresses the oops.

 OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

 ~1 second after the DGPU is put into D3, I still often see 
 hda-intel:
 spurious response 0x0:0x0, last cmd=0x170500:
 http://quora.org/2012/hda-switch-spurious.txt

 Hm, it's not clear who triggers these messages.  I'll try to check 
 the
 code paths.

 Presumably this implies the read of the ring-buffer pointer 
 returned
 0x, so the HDA driver understands the pointer to have 
 wrapped
 and processes the 191 unwritten entries?

 Good point.  Actually there is one bug that looks obviously wrong
 (writing 32bit value to CORBWP).  Maybe it has been working just
 because writing CORBRP doesn't influence except for the reset bit.

 Reading CORBWP as a byte is OK, but this could be better in a word 
 so
 that we can check 0x as invalid.

 A test patch is below.  Hopefully this improves the situation...
   
I'll check this out tomorrow and also instrument the code to get a
backtrace, since there may still be an underlying race with the
previous patches:
  
   [...]
  
That's odd.  The Cirrus one (:00:1b.0) must be independent from
the vga switcheroo things for Nvidia...
   
The patch below is the revised patch of the first one.  Now the vga
switcheroo registration is done before the video controller D3 check,
so the race should be smaller.
  
   This patch improves things further of course; a major improvement over
   without. ~15% of the time, I still get the 'spurious response'
   messages in this callpath:
 
  A possible scenario is that the graphics went in D3 before probing
  hd-audio, and the hd-audio continues to initialize the hardware
  without knowing the graphics counterpart is disabled.
 
  There is a code check_hdmi_disabled() in hda_intel.c that checks the
  availability of the video driver, and it might be that this doesn't
  work as expected...
 
  I think I understand this path.  You are setting OFF, right?
  This will set the audio client OFF before 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Daniel J Blueman
On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote:
 At Tue, 4 Dec 2012 22:46:47 +0800,
 Daniel J Blueman wrote:

 On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote:
  At Tue, 04 Dec 2012 14:23:05 +0100,
  Takashi Iwai wrote:
 
  At Tue, 4 Dec 2012 20:58:55 +0800,
  Daniel J Blueman wrote:
  
   On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
At Tue, 4 Dec 2012 00:50:56 +0800,
Daniel J Blueman wrote:
   
On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 23:08:28 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 22:25:52 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
   At Wed, 28 Nov 2012 09:45:39 +0100,
   Takashi Iwai wrote:
  
   At Wed, 28 Nov 2012 11:45:07 +0800,
   Daniel J Blueman wrote:
   
Hi Seth, Dave, Takashi,
   
If I power down the unused discrete GPU before lightdm 
starts by
fiddling with the sysfs file [1] in the upstart script, I 
see a race
manifesting as the discrete GPU's HDA controller timing 
out to
commands [2].
   
Adding some debug, I see that the registered audio 
devices are put
into D3 before the GPU is, but it turns out that the 
discrete (and
internal) GPU's HDA controller gets registered a bit 
later, so the
list is empty. The symptom is since the HDA driver it's 
talking to
hardware which is now in D3.
   
We could add a mutex to nouveau to allow us to wait for 
the DGPU HDA
controller, but perhaps this should be solved at a higher 
level in the
vgaswitcheroo code; what do you think?
  
   Maybe it's a side effect for the recent effort to fix 
   another race in
   the probe.  A part of them problem is that the registration 
   is done at
   the very last of probing.
  
   Instead of delaying the registration, how about the patch 
   below?
  
   Ping.  If this really works, I'd like to queue it for 3.8 
   merge, at
   least...
 
  Ping ack; I was trying to find time to understand another race 
  that
  occurs with GPU probing after switching, but is separate from 
  the
  situation before switching, here.
 
  In the context of writing the switch, it looks like struct azx 
  isn't
  allocated by the time azx_vs_set_state accesses it [1,2]; 
  racing with
  azx_codec_create?
 
  It was allocated, but it wasn't assigned properly in pci 
  drvdata.
 
  Below is the revised patch.  Just moved pci_set_drvdata() before
  register_vga_switcheroo().  Could you retest with it?

 Superb; this addresses the oops.

 OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

 ~1 second after the DGPU is put into D3, I still often see 
 hda-intel:
 spurious response 0x0:0x0, last cmd=0x170500:
 http://quora.org/2012/hda-switch-spurious.txt

 Hm, it's not clear who triggers these messages.  I'll try to check 
 the
 code paths.

 Presumably this implies the read of the ring-buffer pointer 
 returned
 0x, so the HDA driver understands the pointer to have 
 wrapped
 and processes the 191 unwritten entries?

 Good point.  Actually there is one bug that looks obviously wrong
 (writing 32bit value to CORBWP).  Maybe it has been working just
 because writing CORBRP doesn't influence except for the reset bit.

 Reading CORBWP as a byte is OK, but this could be better in a word 
 so
 that we can check 0x as invalid.

 A test patch is below.  Hopefully this improves the situation...
   
I'll check this out tomorrow and also instrument the code to get a
backtrace, since there may still be an underlying race with the
previous patches:
  
   [...]
  
That's odd.  The Cirrus one (:00:1b.0) must be independent from
the vga switcheroo things for Nvidia...
   
The patch below is the revised patch of the first one.  Now the vga
switcheroo registration is done before the video controller D3 check,
so the race should be smaller.
  
   This patch improves things further of course; a major improvement over
   without. ~15% of the time, I still get the 'spurious response'
   messages in this callpath:
 
  A possible scenario is that the graphics went in D3 before probing
  hd-audio, and the hd-audio continues to initialize the hardware
  without knowing the graphics counterpart is disabled.
 
  There is a code check_hdmi_disabled() in hda_intel.c that checks the
  availability of the video driver, and it might be that this doesn't
  work as expected...
 
  I think I understand this path.  You are 

Re: switcheroo registration vs switching race...

2012-12-04 Thread Takashi Iwai
At Tue, 4 Dec 2012 23:54:39 +0800,
Daniel J Blueman wrote:
 
 On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote:
  At Tue, 4 Dec 2012 22:46:47 +0800,
  Daniel J Blueman wrote:
 
  On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote:
   At Tue, 04 Dec 2012 14:23:05 +0100,
   Takashi Iwai wrote:
  
   At Tue, 4 Dec 2012 20:58:55 +0800,
   Daniel J Blueman wrote:
   
On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote:
 At Tue, 4 Dec 2012 00:50:56 +0800,
 Daniel J Blueman wrote:

 On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 23:08:28 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
   At Mon, 3 Dec 2012 22:25:52 +0800,
   Daniel J Blueman wrote:
  
   On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
   
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:

 Hi Seth, Dave, Takashi,

 If I power down the unused discrete GPU before lightdm 
 starts by
 fiddling with the sysfs file [1] in the upstart script, 
 I see a race
 manifesting as the discrete GPU's HDA controller timing 
 out to
 commands [2].

 Adding some debug, I see that the registered audio 
 devices are put
 into D3 before the GPU is, but it turns out that the 
 discrete (and
 internal) GPU's HDA controller gets registered a bit 
 later, so the
 list is empty. The symptom is since the HDA driver it's 
 talking to
 hardware which is now in D3.

 We could add a mutex to nouveau to allow us to wait for 
 the DGPU HDA
 controller, but perhaps this should be solved at a 
 higher level in the
 vgaswitcheroo code; what do you think?
   
Maybe it's a side effect for the recent effort to fix 
another race in
the probe.  A part of them problem is that the 
registration is done at
the very last of probing.
   
Instead of delaying the registration, how about the patch 
below?
   
Ping.  If this really works, I'd like to queue it for 3.8 
merge, at
least...
  
   Ping ack; I was trying to find time to understand another 
   race that
   occurs with GPU probing after switching, but is separate 
   from the
   situation before switching, here.
  
   In the context of writing the switch, it looks like struct 
   azx isn't
   allocated by the time azx_vs_set_state accesses it [1,2]; 
   racing with
   azx_codec_create?
  
   It was allocated, but it wasn't assigned properly in pci 
   drvdata.
  
   Below is the revised patch.  Just moved pci_set_drvdata() 
   before
   register_vga_switcheroo().  Could you retest with it?
 
  Superb; this addresses the oops.
 
  OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
 
  ~1 second after the DGPU is put into D3, I still often see 
  hda-intel:
  spurious response 0x0:0x0, last cmd=0x170500:
  http://quora.org/2012/hda-switch-spurious.txt
 
  Hm, it's not clear who triggers these messages.  I'll try to 
  check the
  code paths.
 
  Presumably this implies the read of the ring-buffer pointer 
  returned
  0x, so the HDA driver understands the pointer to have 
  wrapped
  and processes the 191 unwritten entries?
 
  Good point.  Actually there is one bug that looks obviously wrong
  (writing 32bit value to CORBWP).  Maybe it has been working just
  because writing CORBRP doesn't influence except for the reset 
  bit.
 
  Reading CORBWP as a byte is OK, but this could be better in a 
  word so
  that we can check 0x as invalid.
 
  A test patch is below.  Hopefully this improves the situation...

 I'll check this out tomorrow and also instrument the code to get a
 backtrace, since there may still be an underlying race with the
 previous patches:
   
[...]
   
 That's odd.  The Cirrus one (:00:1b.0) must be independent from
 the vga switcheroo things for Nvidia...

 The patch below is the revised patch of the first one.  Now the vga
 switcheroo registration is done before the video controller D3 
 check,
 so the race should be smaller.
   
This patch improves things further of course; a major improvement over
without. ~15% of the time, I still get the 'spurious response'
messages in this callpath:
  
   A possible scenario is that the graphics went in D3 before probing
   hd-audio, and the hd-audio continues to initialize the hardware
   without knowing the graphics counterpart is 

Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Tue, 4 Dec 2012 00:50:56 +0800,
Daniel J Blueman wrote:
> 
> On 4 December 2012 00:23, Takashi Iwai  wrote:
> > At Mon, 3 Dec 2012 23:08:28 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 3 December 2012 22:40, Takashi Iwai  wrote:
> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> >> > Daniel J Blueman wrote:
> >> >>
> >> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> >> > Takashi Iwai wrote:
> >> >> >>
> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> >> Daniel J Blueman wrote:
> >> >> >> >
> >> >> >> > Hi Seth, Dave, Takashi,
> >> >> >> >
> >> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a 
> >> >> >> > race
> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> >> > commands [2].
> >> >> >> >
> >> >> >> > Adding some debug, I see that the registered audio devices are put
> >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> >> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> >> >> > hardware which is now in D3.
> >> >> >> >
> >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU 
> >> >> >> > HDA
> >> >> >> > controller, but perhaps this should be solved at a higher level in 
> >> >> >> > the
> >> >> >> > vgaswitcheroo code; what do you think?
> >> >> >>
> >> >> >> Maybe it's a side effect for the recent effort to fix another race in
> >> >> >> the probe.  A part of them problem is that the registration is done 
> >> >> >> at
> >> >> >> the very last of probing.
> >> >> >>
> >> >> >> Instead of delaying the registration, how about the patch below?
> >> >> >
> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> >> >> > least...
> >> >>
> >> >> Ping ack; I was trying to find time to understand another race that
> >> >> occurs with GPU probing after switching, but is separate from the
> >> >> situation before switching, here.
> >> >>
> >> >> In the context of writing the switch, it looks like struct azx isn't
> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> >> azx_codec_create?
> >> >
> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >> >
> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
> >> > register_vga_switcheroo().  Could you retest with it?
> >>
> >> Superb; this addresses the oops.
> >
> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> >
> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> >> spurious response 0x0:0x0, last cmd=0x170500":
> >> http://quora.org/2012/hda-switch-spurious.txt
> >
> > Hm, it's not clear who triggers these messages.  I'll try to check the
> > code paths.
> >
> >> Presumably this implies the read of the ring-buffer pointer returned
> >> 0x, so the HDA driver understands the pointer to have wrapped
> >> and processes the 191 unwritten entries?
> >
> > Good point.  Actually there is one bug that looks obviously wrong
> > (writing 32bit value to CORBWP).  Maybe it has been working just
> > because writing CORBRP doesn't influence except for the reset bit.
> >
> > Reading CORBWP as a byte is OK, but this could be better in a word so
> > that we can check 0x as invalid.
> >
> > A test patch is below.  Hopefully this improves the situation...
> 
> I'll check this out tomorrow and also instrument the code to get a
> backtrace, since there may still be an underlying race with the
> previous patches:
> 
> [8.203827] snd_hda_intel :00:1b.0: enabling device ( -> 0002)
> [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
> [   10.981297] VGA switcheroo: switched nouveau off
> [   10.981383] nouveau  [ DRM] suspending fbcon...
> [   10.981388] nouveau  [ DRM] suspending display...
> [   10.981687] nouveau  [ DRM] unpinning framebuffer(s)...
> [   10.981825] nouveau  [ DRM] evicting buffers...
> [   10.992948] nouveau  [ DRM] suspending client object trees...
> [   11.310697] hda-intel: azx_get_response timeout, switching to
> polling mode: last cmd=0x000f
> [   12.320236] hda-intel: No response from codec, disabling MSI: last
> cmd=0x000f
> [   13.329721] hda-intel: Codec #0 probe error; disabling it...
> [   14.419163] hda_intel: azx_get_response timeout, switching to
> single_cmd mode: last cmd=0x000f
> [   14.419855] hda-intel: no codecs initialized
> [   14.459150] snd_hda_intel :01:00.1: Refused to change power
> state, currently in D3
> [   14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client
> [   14.459183] hda-intel: VGA controller for :01:00.1 is disabled
> [   14.459184] hda-intel: Delaying initialization
> 
> Full log at: http://quora.org/2012/hda-switch-spurious2.txt
> 
> The first 

Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 4 December 2012 00:23, Takashi Iwai  wrote:
> At Mon, 3 Dec 2012 23:08:28 +0800,
> Daniel J Blueman wrote:
>>
>> On 3 December 2012 22:40, Takashi Iwai  wrote:
>> > At Mon, 3 Dec 2012 22:25:52 +0800,
>> > Daniel J Blueman wrote:
>> >>
>> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
>> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> >> > Takashi Iwai wrote:
>> >> >>
>> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> >> Daniel J Blueman wrote:
>> >> >> >
>> >> >> > Hi Seth, Dave, Takashi,
>> >> >> >
>> >> >> > If I power down the unused discrete GPU before lightdm starts by
>> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
>> >> >> > manifesting as the discrete GPU's HDA controller timing out to
>> >> >> > commands [2].
>> >> >> >
>> >> >> > Adding some debug, I see that the registered audio devices are put
>> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
>> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
>> >> >> > list is empty. The symptom is since the HDA driver it's talking to
>> >> >> > hardware which is now in D3.
>> >> >> >
>> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
>> >> >> > controller, but perhaps this should be solved at a higher level in 
>> >> >> > the
>> >> >> > vgaswitcheroo code; what do you think?
>> >> >>
>> >> >> Maybe it's a side effect for the recent effort to fix another race in
>> >> >> the probe.  A part of them problem is that the registration is done at
>> >> >> the very last of probing.
>> >> >>
>> >> >> Instead of delaying the registration, how about the patch below?
>> >> >
>> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
>> >> > least...
>> >>
>> >> Ping ack; I was trying to find time to understand another race that
>> >> occurs with GPU probing after switching, but is separate from the
>> >> situation before switching, here.
>> >>
>> >> In the context of writing the switch, it looks like struct azx isn't
>> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
>> >> azx_codec_create?
>> >
>> > It was allocated, but it wasn't assigned properly in pci drvdata.
>> >
>> > Below is the revised patch.  Just moved pci_set_drvdata() before
>> > register_vga_switcheroo().  Could you retest with it?
>>
>> Superb; this addresses the oops.
>
> OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
>
>> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
>> spurious response 0x0:0x0, last cmd=0x170500":
>> http://quora.org/2012/hda-switch-spurious.txt
>
> Hm, it's not clear who triggers these messages.  I'll try to check the
> code paths.
>
>> Presumably this implies the read of the ring-buffer pointer returned
>> 0x, so the HDA driver understands the pointer to have wrapped
>> and processes the 191 unwritten entries?
>
> Good point.  Actually there is one bug that looks obviously wrong
> (writing 32bit value to CORBWP).  Maybe it has been working just
> because writing CORBRP doesn't influence except for the reset bit.
>
> Reading CORBWP as a byte is OK, but this could be better in a word so
> that we can check 0x as invalid.
>
> A test patch is below.  Hopefully this improves the situation...

I'll check this out tomorrow and also instrument the code to get a
backtrace, since there may still be an underlying race with the
previous patches:

[8.203827] snd_hda_intel :00:1b.0: enabling device ( -> 0002)
[8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
[   10.981297] VGA switcheroo: switched nouveau off
[   10.981383] nouveau  [ DRM] suspending fbcon...
[   10.981388] nouveau  [ DRM] suspending display...
[   10.981687] nouveau  [ DRM] unpinning framebuffer(s)...
[   10.981825] nouveau  [ DRM] evicting buffers...
[   10.992948] nouveau  [ DRM] suspending client object trees...
[   11.310697] hda-intel: azx_get_response timeout, switching to
polling mode: last cmd=0x000f
[   12.320236] hda-intel: No response from codec, disabling MSI: last
cmd=0x000f
[   13.329721] hda-intel: Codec #0 probe error; disabling it...
[   14.419163] hda_intel: azx_get_response timeout, switching to
single_cmd mode: last cmd=0x000f
[   14.419855] hda-intel: no codecs initialized
[   14.459150] snd_hda_intel :01:00.1: Refused to change power
state, currently in D3
[   14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client
[   14.459183] hda-intel: VGA controller for :01:00.1 is disabled
[   14.459184] hda-intel: Delaying initialization

Full log at: http://quora.org/2012/hda-switch-spurious2.txt

The first response timeout message plausibly could be from the CS4206
(0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio
devices after. Also, we see that checking for master abort in one
place won't be sufficient.

Stay tuned for backtraces tomorrow.

Thanks again so far!
  Daniel
--
Daniel J Blueman
--
To 

Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
> 
> On 3 December 2012 22:40, Takashi Iwai  wrote:
> > At Mon, 3 Dec 2012 22:25:52 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 3 December 2012 19:17, Takashi Iwai  wrote:
> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> > Takashi Iwai wrote:
> >> >>
> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> >
> >> >> > Hi Seth, Dave, Takashi,
> >> >> >
> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> > commands [2].
> >> >> >
> >> >> > Adding some debug, I see that the registered audio devices are put
> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> >> > hardware which is now in D3.
> >> >> >
> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> >> >> > controller, but perhaps this should be solved at a higher level in the
> >> >> > vgaswitcheroo code; what do you think?
> >> >>
> >> >> Maybe it's a side effect for the recent effort to fix another race in
> >> >> the probe.  A part of them problem is that the registration is done at
> >> >> the very last of probing.
> >> >>
> >> >> Instead of delaying the registration, how about the patch below?
> >> >
> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> >> > least...
> >>
> >> Ping ack; I was trying to find time to understand another race that
> >> occurs with GPU probing after switching, but is separate from the
> >> situation before switching, here.
> >>
> >> In the context of writing the switch, it looks like struct azx isn't
> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> azx_codec_create?
> >
> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >
> > Below is the revised patch.  Just moved pci_set_drvdata() before
> > register_vga_switcheroo().  Could you retest with it?
> 
> Superb; this addresses the oops.

OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> spurious response 0x0:0x0, last cmd=0x170500":
> http://quora.org/2012/hda-switch-spurious.txt

Hm, it's not clear who triggers these messages.  I'll try to check the
code paths.

> Presumably this implies the read of the ring-buffer pointer returned
> 0x, so the HDA driver understands the pointer to have wrapped
> and processes the 191 unwritten entries?

Good point.  Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP).  Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.

Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0x as invalid.

A test patch is below.  Hopefully this improves the situation...


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..ce990db 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val)
spin_lock_irq(>reg_lock);
 
/* add command to corb */
-   wp = azx_readb(chip, CORBWP);
+   wp = azx_readw(chip, CORBWP);
+   if (wp == 0x) {
+   /* something wrong, controller likely turned to D3 */
+   spin_unlock_irq(>reg_lock);
+   return -1;
+   }
wp++;
wp %= ICH6_MAX_CORB_ENTRIES;
 
chip->rirb.cmds[addr]++;
chip->corb.buf[wp] = cpu_to_le32(val);
-   azx_writel(chip, CORBWP, wp);
+   azx_writew(chip, CORBWP, wp);
 
spin_unlock_irq(>reg_lock);
 
@@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip)
unsigned int addr;
u32 res, res_ex;
 
-   wp = azx_readb(chip, RIRBWP);
+   wp = azx_readw(chip, RIRBWP);
+   if (wp == 0x) {
+   /* something wrong, controller likely turned to D3 */
+   return;
+   }
+
if (wp == chip->rirb.wp)
return;
chip->rirb.wp = wp;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 3 December 2012 22:40, Takashi Iwai  wrote:
> At Mon, 3 Dec 2012 22:25:52 +0800,
> Daniel J Blueman wrote:
>>
>> On 3 December 2012 19:17, Takashi Iwai  wrote:
>> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> > Takashi Iwai wrote:
>> >>
>> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> Daniel J Blueman wrote:
>> >> >
>> >> > Hi Seth, Dave, Takashi,
>> >> >
>> >> > If I power down the unused discrete GPU before lightdm starts by
>> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
>> >> > manifesting as the discrete GPU's HDA controller timing out to
>> >> > commands [2].
>> >> >
>> >> > Adding some debug, I see that the registered audio devices are put
>> >> > into D3 before the GPU is, but it turns out that the discrete (and
>> >> > internal) GPU's HDA controller gets registered a bit later, so the
>> >> > list is empty. The symptom is since the HDA driver it's talking to
>> >> > hardware which is now in D3.
>> >> >
>> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
>> >> > controller, but perhaps this should be solved at a higher level in the
>> >> > vgaswitcheroo code; what do you think?
>> >>
>> >> Maybe it's a side effect for the recent effort to fix another race in
>> >> the probe.  A part of them problem is that the registration is done at
>> >> the very last of probing.
>> >>
>> >> Instead of delaying the registration, how about the patch below?
>> >
>> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
>> > least...
>>
>> Ping ack; I was trying to find time to understand another race that
>> occurs with GPU probing after switching, but is separate from the
>> situation before switching, here.
>>
>> In the context of writing the switch, it looks like struct azx isn't
>> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
>> azx_codec_create?
>
> It was allocated, but it wasn't assigned properly in pci drvdata.
>
> Below is the revised patch.  Just moved pci_set_drvdata() before
> register_vga_switcheroo().  Could you retest with it?

Superb; this addresses the oops.

~1 second after the DGPU is put into D3, I still often see "hda-intel:
spurious response 0x0:0x0, last cmd=0x170500":
http://quora.org/2012/hda-switch-spurious.txt

Presumably this implies the read of the ring-buffer pointer returned
0x, so the HDA driver understands the pointer to have wrapped
and processes the 191 unwritten entries?

Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Mon, 3 Dec 2012 22:25:52 +0800,
Daniel J Blueman wrote:
> 
> On 3 December 2012 19:17, Takashi Iwai  wrote:
> > At Wed, 28 Nov 2012 09:45:39 +0100,
> > Takashi Iwai wrote:
> >>
> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> Daniel J Blueman wrote:
> >> >
> >> > Hi Seth, Dave, Takashi,
> >> >
> >> > If I power down the unused discrete GPU before lightdm starts by
> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> > commands [2].
> >> >
> >> > Adding some debug, I see that the registered audio devices are put
> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> > hardware which is now in D3.
> >> >
> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> >> > controller, but perhaps this should be solved at a higher level in the
> >> > vgaswitcheroo code; what do you think?
> >>
> >> Maybe it's a side effect for the recent effort to fix another race in
> >> the probe.  A part of them problem is that the registration is done at
> >> the very last of probing.
> >>
> >> Instead of delaying the registration, how about the patch below?
> >
> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> > least...
> 
> Ping ack; I was trying to find time to understand another race that
> occurs with GPU probing after switching, but is separate from the
> situation before switching, here.
> 
> In the context of writing the switch, it looks like struct azx isn't
> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> azx_codec_create?

It was allocated, but it wasn't assigned properly in pci drvdata.

Below is the revised patch.  Just moved pci_set_drvdata() before
register_vga_switcheroo().  Could you retest with it?


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..b7c482a 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86
 /* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
/* locks */
spinlock_t reg_lock;
struct mutex open_mutex;
+   struct completion probe_wait;
 
/* streams (x num_streams) */
struct azx_dev *azx_dev;
@@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
struct snd_card *card = pci_get_drvdata(pci);
struct azx *chip = card->private_data;
 
+   wait_for_completion(>probe_wait);
if (chip->init_failed)
return false;
if (chip->disabled || !chip->bus)
@@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
 
azx_notifier_unregister(chip);
 
+   chip->init_failed = 1; /* to be sure */
+   complete(>probe_wait);
+
if (use_vga_switcheroo(chip)) {
if (chip->disabled && chip->bus)
snd_hda_unlock_devices(chip->bus);
@@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
INIT_LIST_HEAD(>pcm_list);
INIT_LIST_HEAD(>list);
init_vga_switcheroo(chip);
+   init_completion(>probe_wait);
 
chip->position_fix[0] = chip->position_fix[1] =
check_position_fix(chip, position_fix[dev]);
@@ -3462,17 +3469,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
}
 #endif /* CONFIG_SND_HDA_PATCH_LOADER */
 
-   if (probe_now) {
-   err = azx_probe_continue(chip);
-   if (err < 0)
-   goto out_free;
-   }
-
pci_set_drvdata(pci, card);
 
-   if (pci_dev_run_wake(pci))
-   pm_runtime_put_noidle(>dev);
-
err = register_vga_switcheroo(chip);
if (err < 0) {
snd_printk(KERN_ERR SFX
@@ -3480,11 +3478,22 @@ static int __devinit azx_probe(struct pci_dev *pci,
goto out_free;
}
 
+   if (probe_now) {
+   err = azx_probe_continue(chip);
+   if (err < 0)
+   goto out_free;
+   }
+
+   if (pci_dev_run_wake(pci))
+   pm_runtime_put_noidle(>dev);
+
dev++;
+   complete(>probe_wait);
return 0;
 
 out_free:
snd_card_free(card);
+   pci_set_drvdata(pci, NULL);
return err;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 3 December 2012 19:17, Takashi Iwai  wrote:
> At Wed, 28 Nov 2012 09:45:39 +0100,
> Takashi Iwai wrote:
>>
>> At Wed, 28 Nov 2012 11:45:07 +0800,
>> Daniel J Blueman wrote:
>> >
>> > Hi Seth, Dave, Takashi,
>> >
>> > If I power down the unused discrete GPU before lightdm starts by
>> > fiddling with the sysfs file [1] in the upstart script, I see a race
>> > manifesting as the discrete GPU's HDA controller timing out to
>> > commands [2].
>> >
>> > Adding some debug, I see that the registered audio devices are put
>> > into D3 before the GPU is, but it turns out that the discrete (and
>> > internal) GPU's HDA controller gets registered a bit later, so the
>> > list is empty. The symptom is since the HDA driver it's talking to
>> > hardware which is now in D3.
>> >
>> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
>> > controller, but perhaps this should be solved at a higher level in the
>> > vgaswitcheroo code; what do you think?
>>
>> Maybe it's a side effect for the recent effort to fix another race in
>> the probe.  A part of them problem is that the registration is done at
>> the very last of probing.
>>
>> Instead of delaying the registration, how about the patch below?
>
> Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> least...

Ping ack; I was trying to find time to understand another race that
occurs with GPU probing after switching, but is separate from the
situation before switching, here.

In the context of writing the switch, it looks like struct azx isn't
allocated by the time azx_vs_set_state accesses it [1,2]; racing with
azx_codec_create?

The full dmesg output is at: http://quora.org/2012/hda-switch-oops.txt

Thanks,
  Daniel

--- [1]

BUG: unable to handle kernel NULL pointer dereference at 0170
IP: [] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
PGD 26323d067 PUD 264f58067 PMD 0
Oops:  [#1] SMP
Modules linked in: snd_hda_codec_hdmi snd_hda_codec_cirrus rfcomm bnep
nls_iso8859_1 joydev hid_apple bcm5974 nouveau coretemp kvm_intel b43
kvm uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops
ghash_clmulni_intel smsc75xx usbnet mii ttm snd_hda_intel(+)
snd_hda_codec snd_hwdep ssb i915 snd_pcm mxm_wmi snd_timer apple_gmux
applesmc mei lpc_ich microcode hwmon mfd_core input_polldev bcma snd
drm_kms_helper snd_page_alloc video apple_bl sdhci_pci sdhci mmc_core
CPU 1
Pid: 967, comm: sh Not tainted 3.7.0-rc7-expert+ #8 Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F
RIP: 0010:[] []
azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
RSP: 0018:88025198de48 EFLAGS: 00010286
RAX:  RBX: 880251960a00 RCX: 
RDX:  RSI:  RDI: 880265b41098
RBP: 88025198de68 R08: 0003 R09: 1000
R10: 7fffe481b730 R11: 0246 R12: 880265b41098
R13:  R14: 88025198df50 R15: 
FS: 7f4961480700() GS:88026f24() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0170 CR3: 000263cd3000 CR4: 001407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 967, threadinfo 88025198c000, task 88025d635820)
Stack:
 88025d635820 880251960a00  88025198de98
 88025198de88 812b8e77 880263ef1740 0004
 88025198def8 812b947c 88020a46464f 81107982
Call Trace:
 [] set_audio_state+0x67/0x70
 [] vga_switcheroo_debugfs_write+0xbc/0x380
 [] ? __alloc_fd+0x42/0x110
 [] ? __fd_install+0x29/0x60
 [] vfs_write+0xa3/0x160
 [] sys_write+0x4d/0xa0
 [] ? do_page_fault+0x9/0x10
 [] system_call_fastpath+0x1a/0x1f
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98
00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 fa a4 0d e1 <48>
8b 98 70 01 00 00 0f b6 83 dd 01 00 00 a8 10 75 34 45 85 ed
RIP [] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
 RSP 
CR2: 0170

--- [2]

$ gdb ./sound/pci/hda/snd-hda-intel.ko
(gdb) list *(azx_vs_set_state+0x26)
0x3036 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2628).
2623
2624static void azx_vs_set_state(struct pci_dev *pci,
2625 enum vga_switcheroo_state state)
2626{
2627struct snd_card *card = pci_get_drvdata(pci);
2628struct azx *chip = card->private_data;
2629bool disabled;
2630
2631if (chip->init_failed)
2632return;
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
> 
> At Wed, 28 Nov 2012 11:45:07 +0800,
> Daniel J Blueman wrote:
> > 
> > Hi Seth, Dave, Takashi,
> > 
> > If I power down the unused discrete GPU before lightdm starts by
> > fiddling with the sysfs file [1] in the upstart script, I see a race
> > manifesting as the discrete GPU's HDA controller timing out to
> > commands [2].
> > 
> > Adding some debug, I see that the registered audio devices are put
> > into D3 before the GPU is, but it turns out that the discrete (and
> > internal) GPU's HDA controller gets registered a bit later, so the
> > list is empty. The symptom is since the HDA driver it's talking to
> > hardware which is now in D3.
> > 
> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> > controller, but perhaps this should be solved at a higher level in the
> > vgaswitcheroo code; what do you think?
> 
> Maybe it's a side effect for the recent effort to fix another race in
> the probe.  A part of them problem is that the registration is done at
> the very last of probing.
> 
> Instead of delaying the registration, how about the patch below?

Ping.  If this really works, I'd like to queue it for 3.8 merge, at
least...


thanks,

Takashi

> 
> 
> Takashi
> 
> ---
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 4bb52da..17fbd68 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -49,6 +49,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_X86
>  /* for snoop control */
> @@ -469,6 +470,7 @@ struct azx {
>   /* locks */
>   spinlock_t reg_lock;
>   struct mutex open_mutex;
> + struct completion probe_wait;
>  
>   /* streams (x num_streams) */
>   struct azx_dev *azx_dev;
> @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
>   struct snd_card *card = pci_get_drvdata(pci);
>   struct azx *chip = card->private_data;
>  
> + wait_for_completion(>probe_wait);
>   if (chip->init_failed)
>   return false;
>   if (chip->disabled || !chip->bus)
> @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
>  
>   azx_notifier_unregister(chip);
>  
> + chip->init_failed = 1; /* to be sure */
> + complete(>probe_wait);
> +
>   if (use_vga_switcheroo(chip)) {
>   if (chip->disabled && chip->bus)
>   snd_hda_unlock_devices(chip->bus);
> @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
> struct pci_dev *pci,
>   INIT_LIST_HEAD(>pcm_list);
>   INIT_LIST_HEAD(>list);
>   init_vga_switcheroo(chip);
> + init_completion(>probe_wait);
>  
>   chip->position_fix[0] = chip->position_fix[1] =
>   check_position_fix(chip, position_fix[dev]);
> @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
>   }
>  #endif /* CONFIG_SND_HDA_PATCH_LOADER */
>  
> + err = register_vga_switcheroo(chip);
> + if (err < 0) {
> + snd_printk(KERN_ERR SFX
> +"Error registering VGA-switcheroo client\n");
> + goto out_free;
> + }
> +
>   if (probe_now) {
>   err = azx_probe_continue(chip);
>   if (err < 0)
> @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
>   if (pci_dev_run_wake(pci))
>   pm_runtime_put_noidle(>dev);
>  
> - err = register_vga_switcheroo(chip);
> - if (err < 0) {
> - snd_printk(KERN_ERR SFX
> -"Error registering VGA-switcheroo client\n");
> - goto out_free;
> - }
> -
>   dev++;
> + complete(>probe_wait);
>   return 0;
>  
>  out_free:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
 
 At Wed, 28 Nov 2012 11:45:07 +0800,
 Daniel J Blueman wrote:
  
  Hi Seth, Dave, Takashi,
  
  If I power down the unused discrete GPU before lightdm starts by
  fiddling with the sysfs file [1] in the upstart script, I see a race
  manifesting as the discrete GPU's HDA controller timing out to
  commands [2].
  
  Adding some debug, I see that the registered audio devices are put
  into D3 before the GPU is, but it turns out that the discrete (and
  internal) GPU's HDA controller gets registered a bit later, so the
  list is empty. The symptom is since the HDA driver it's talking to
  hardware which is now in D3.
  
  We could add a mutex to nouveau to allow us to wait for the DGPU HDA
  controller, but perhaps this should be solved at a higher level in the
  vgaswitcheroo code; what do you think?
 
 Maybe it's a side effect for the recent effort to fix another race in
 the probe.  A part of them problem is that the registration is done at
 the very last of probing.
 
 Instead of delaying the registration, how about the patch below?

Ping.  If this really works, I'd like to queue it for 3.8 merge, at
least...


thanks,

Takashi

 
 
 Takashi
 
 ---
 diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
 index 4bb52da..17fbd68 100644
 --- a/sound/pci/hda/hda_intel.c
 +++ b/sound/pci/hda/hda_intel.c
 @@ -49,6 +49,7 @@
  #include linux/pm_runtime.h
  #include linux/clocksource.h
  #include linux/time.h
 +#include linux/completion.h
  
  #ifdef CONFIG_X86
  /* for snoop control */
 @@ -469,6 +470,7 @@ struct azx {
   /* locks */
   spinlock_t reg_lock;
   struct mutex open_mutex;
 + struct completion probe_wait;
  
   /* streams (x num_streams) */
   struct azx_dev *azx_dev;
 @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
   struct snd_card *card = pci_get_drvdata(pci);
   struct azx *chip = card-private_data;
  
 + wait_for_completion(chip-probe_wait);
   if (chip-init_failed)
   return false;
   if (chip-disabled || !chip-bus)
 @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
  
   azx_notifier_unregister(chip);
  
 + chip-init_failed = 1; /* to be sure */
 + complete(chip-probe_wait);
 +
   if (use_vga_switcheroo(chip)) {
   if (chip-disabled  chip-bus)
   snd_hda_unlock_devices(chip-bus);
 @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
 struct pci_dev *pci,
   INIT_LIST_HEAD(chip-pcm_list);
   INIT_LIST_HEAD(chip-list);
   init_vga_switcheroo(chip);
 + init_completion(chip-probe_wait);
  
   chip-position_fix[0] = chip-position_fix[1] =
   check_position_fix(chip, position_fix[dev]);
 @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
   }
  #endif /* CONFIG_SND_HDA_PATCH_LOADER */
  
 + err = register_vga_switcheroo(chip);
 + if (err  0) {
 + snd_printk(KERN_ERR SFX
 +Error registering VGA-switcheroo client\n);
 + goto out_free;
 + }
 +
   if (probe_now) {
   err = azx_probe_continue(chip);
   if (err  0)
 @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
   if (pci_dev_run_wake(pci))
   pm_runtime_put_noidle(pci-dev);
  
 - err = register_vga_switcheroo(chip);
 - if (err  0) {
 - snd_printk(KERN_ERR SFX
 -Error registering VGA-switcheroo client\n);
 - goto out_free;
 - }
 -
   dev++;
 + complete(chip-probe_wait);
   return 0;
  
  out_free:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
 At Wed, 28 Nov 2012 09:45:39 +0100,
 Takashi Iwai wrote:

 At Wed, 28 Nov 2012 11:45:07 +0800,
 Daniel J Blueman wrote:
 
  Hi Seth, Dave, Takashi,
 
  If I power down the unused discrete GPU before lightdm starts by
  fiddling with the sysfs file [1] in the upstart script, I see a race
  manifesting as the discrete GPU's HDA controller timing out to
  commands [2].
 
  Adding some debug, I see that the registered audio devices are put
  into D3 before the GPU is, but it turns out that the discrete (and
  internal) GPU's HDA controller gets registered a bit later, so the
  list is empty. The symptom is since the HDA driver it's talking to
  hardware which is now in D3.
 
  We could add a mutex to nouveau to allow us to wait for the DGPU HDA
  controller, but perhaps this should be solved at a higher level in the
  vgaswitcheroo code; what do you think?

 Maybe it's a side effect for the recent effort to fix another race in
 the probe.  A part of them problem is that the registration is done at
 the very last of probing.

 Instead of delaying the registration, how about the patch below?

 Ping.  If this really works, I'd like to queue it for 3.8 merge, at
 least...

Ping ack; I was trying to find time to understand another race that
occurs with GPU probing after switching, but is separate from the
situation before switching, here.

In the context of writing the switch, it looks like struct azx isn't
allocated by the time azx_vs_set_state accesses it [1,2]; racing with
azx_codec_create?

The full dmesg output is at: http://quora.org/2012/hda-switch-oops.txt

Thanks,
  Daniel

--- [1]

BUG: unable to handle kernel NULL pointer dereference at 0170
IP: [a01e4006] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
PGD 26323d067 PUD 264f58067 PMD 0
Oops:  [#1] SMP
Modules linked in: snd_hda_codec_hdmi snd_hda_codec_cirrus rfcomm bnep
nls_iso8859_1 joydev hid_apple bcm5974 nouveau coretemp kvm_intel b43
kvm uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops
ghash_clmulni_intel smsc75xx usbnet mii ttm snd_hda_intel(+)
snd_hda_codec snd_hwdep ssb i915 snd_pcm mxm_wmi snd_timer apple_gmux
applesmc mei lpc_ich microcode hwmon mfd_core input_polldev bcma snd
drm_kms_helper snd_page_alloc video apple_bl sdhci_pci sdhci mmc_core
CPU 1
Pid: 967, comm: sh Not tainted 3.7.0-rc7-expert+ #8 Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F
RIP: 0010:[a01e4006] [a01e4006]
azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
RSP: 0018:88025198de48 EFLAGS: 00010286
RAX:  RBX: 880251960a00 RCX: 
RDX:  RSI:  RDI: 880265b41098
RBP: 88025198de68 R08: 0003 R09: 1000
R10: 7fffe481b730 R11: 0246 R12: 880265b41098
R13:  R14: 88025198df50 R15: 
FS: 7f4961480700() GS:88026f24() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0170 CR3: 000263cd3000 CR4: 001407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 967, threadinfo 88025198c000, task 88025d635820)
Stack:
 88025d635820 880251960a00  88025198de98
 88025198de88 812b8e77 880263ef1740 0004
 88025198def8 812b947c 88020a46464f 81107982
Call Trace:
 [812b8e77] set_audio_state+0x67/0x70
 [812b947c] vga_switcheroo_debugfs_write+0xbc/0x380
 [81107982] ? __alloc_fd+0x42/0x110
 [81107aa9] ? __fd_install+0x29/0x60
 [810ed703] vfs_write+0xa3/0x160
 [810eda0d] sys_write+0x4d/0xa0
 [810283f9] ? do_page_fault+0x9/0x10
 [814b7ed6] system_call_fastpath+0x1a/0x1f
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98
00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 fa a4 0d e1 48
8b 98 70 01 00 00 0f b6 83 dd 01 00 00 a8 10 75 34 45 85 ed
RIP [a01e4006] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel]
 RSP 88025198de48
CR2: 0170

--- [2]

$ gdb ./sound/pci/hda/snd-hda-intel.ko
(gdb) list *(azx_vs_set_state+0x26)
0x3036 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2628).
2623
2624static void azx_vs_set_state(struct pci_dev *pci,
2625 enum vga_switcheroo_state state)
2626{
2627struct snd_card *card = pci_get_drvdata(pci);
2628struct azx *chip = card-private_data;
2629bool disabled;
2630
2631if (chip-init_failed)
2632return;
--
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Mon, 3 Dec 2012 22:25:52 +0800,
Daniel J Blueman wrote:
 
 On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
  At Wed, 28 Nov 2012 09:45:39 +0100,
  Takashi Iwai wrote:
 
  At Wed, 28 Nov 2012 11:45:07 +0800,
  Daniel J Blueman wrote:
  
   Hi Seth, Dave, Takashi,
  
   If I power down the unused discrete GPU before lightdm starts by
   fiddling with the sysfs file [1] in the upstart script, I see a race
   manifesting as the discrete GPU's HDA controller timing out to
   commands [2].
  
   Adding some debug, I see that the registered audio devices are put
   into D3 before the GPU is, but it turns out that the discrete (and
   internal) GPU's HDA controller gets registered a bit later, so the
   list is empty. The symptom is since the HDA driver it's talking to
   hardware which is now in D3.
  
   We could add a mutex to nouveau to allow us to wait for the DGPU HDA
   controller, but perhaps this should be solved at a higher level in the
   vgaswitcheroo code; what do you think?
 
  Maybe it's a side effect for the recent effort to fix another race in
  the probe.  A part of them problem is that the registration is done at
  the very last of probing.
 
  Instead of delaying the registration, how about the patch below?
 
  Ping.  If this really works, I'd like to queue it for 3.8 merge, at
  least...
 
 Ping ack; I was trying to find time to understand another race that
 occurs with GPU probing after switching, but is separate from the
 situation before switching, here.
 
 In the context of writing the switch, it looks like struct azx isn't
 allocated by the time azx_vs_set_state accesses it [1,2]; racing with
 azx_codec_create?

It was allocated, but it wasn't assigned properly in pci drvdata.

Below is the revised patch.  Just moved pci_set_drvdata() before
register_vga_switcheroo().  Could you retest with it?


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..b7c482a 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
 #include linux/pm_runtime.h
 #include linux/clocksource.h
 #include linux/time.h
+#include linux/completion.h
 
 #ifdef CONFIG_X86
 /* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
/* locks */
spinlock_t reg_lock;
struct mutex open_mutex;
+   struct completion probe_wait;
 
/* streams (x num_streams) */
struct azx_dev *azx_dev;
@@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
struct snd_card *card = pci_get_drvdata(pci);
struct azx *chip = card-private_data;
 
+   wait_for_completion(chip-probe_wait);
if (chip-init_failed)
return false;
if (chip-disabled || !chip-bus)
@@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
 
azx_notifier_unregister(chip);
 
+   chip-init_failed = 1; /* to be sure */
+   complete(chip-probe_wait);
+
if (use_vga_switcheroo(chip)) {
if (chip-disabled  chip-bus)
snd_hda_unlock_devices(chip-bus);
@@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
INIT_LIST_HEAD(chip-pcm_list);
INIT_LIST_HEAD(chip-list);
init_vga_switcheroo(chip);
+   init_completion(chip-probe_wait);
 
chip-position_fix[0] = chip-position_fix[1] =
check_position_fix(chip, position_fix[dev]);
@@ -3462,17 +3469,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
}
 #endif /* CONFIG_SND_HDA_PATCH_LOADER */
 
-   if (probe_now) {
-   err = azx_probe_continue(chip);
-   if (err  0)
-   goto out_free;
-   }
-
pci_set_drvdata(pci, card);
 
-   if (pci_dev_run_wake(pci))
-   pm_runtime_put_noidle(pci-dev);
-
err = register_vga_switcheroo(chip);
if (err  0) {
snd_printk(KERN_ERR SFX
@@ -3480,11 +3478,22 @@ static int __devinit azx_probe(struct pci_dev *pci,
goto out_free;
}
 
+   if (probe_now) {
+   err = azx_probe_continue(chip);
+   if (err  0)
+   goto out_free;
+   }
+
+   if (pci_dev_run_wake(pci))
+   pm_runtime_put_noidle(pci-dev);
+
dev++;
+   complete(chip-probe_wait);
return 0;
 
 out_free:
snd_card_free(card);
+   pci_set_drvdata(pci, NULL);
return err;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 22:25:52 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
  At Wed, 28 Nov 2012 09:45:39 +0100,
  Takashi Iwai wrote:
 
  At Wed, 28 Nov 2012 11:45:07 +0800,
  Daniel J Blueman wrote:
  
   Hi Seth, Dave, Takashi,
  
   If I power down the unused discrete GPU before lightdm starts by
   fiddling with the sysfs file [1] in the upstart script, I see a race
   manifesting as the discrete GPU's HDA controller timing out to
   commands [2].
  
   Adding some debug, I see that the registered audio devices are put
   into D3 before the GPU is, but it turns out that the discrete (and
   internal) GPU's HDA controller gets registered a bit later, so the
   list is empty. The symptom is since the HDA driver it's talking to
   hardware which is now in D3.
  
   We could add a mutex to nouveau to allow us to wait for the DGPU HDA
   controller, but perhaps this should be solved at a higher level in the
   vgaswitcheroo code; what do you think?
 
  Maybe it's a side effect for the recent effort to fix another race in
  the probe.  A part of them problem is that the registration is done at
  the very last of probing.
 
  Instead of delaying the registration, how about the patch below?
 
  Ping.  If this really works, I'd like to queue it for 3.8 merge, at
  least...

 Ping ack; I was trying to find time to understand another race that
 occurs with GPU probing after switching, but is separate from the
 situation before switching, here.

 In the context of writing the switch, it looks like struct azx isn't
 allocated by the time azx_vs_set_state accesses it [1,2]; racing with
 azx_codec_create?

 It was allocated, but it wasn't assigned properly in pci drvdata.

 Below is the revised patch.  Just moved pci_set_drvdata() before
 register_vga_switcheroo().  Could you retest with it?

Superb; this addresses the oops.

~1 second after the DGPU is put into D3, I still often see hda-intel:
spurious response 0x0:0x0, last cmd=0x170500:
http://quora.org/2012/hda-switch-spurious.txt

Presumably this implies the read of the ring-buffer pointer returned
0x, so the HDA driver understands the pointer to have wrapped
and processes the 191 unwritten entries?

Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
 
 On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 22:25:52 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
   At Wed, 28 Nov 2012 09:45:39 +0100,
   Takashi Iwai wrote:
  
   At Wed, 28 Nov 2012 11:45:07 +0800,
   Daniel J Blueman wrote:
   
Hi Seth, Dave, Takashi,
   
If I power down the unused discrete GPU before lightdm starts by
fiddling with the sysfs file [1] in the upstart script, I see a race
manifesting as the discrete GPU's HDA controller timing out to
commands [2].
   
Adding some debug, I see that the registered audio devices are put
into D3 before the GPU is, but it turns out that the discrete (and
internal) GPU's HDA controller gets registered a bit later, so the
list is empty. The symptom is since the HDA driver it's talking to
hardware which is now in D3.
   
We could add a mutex to nouveau to allow us to wait for the DGPU HDA
controller, but perhaps this should be solved at a higher level in the
vgaswitcheroo code; what do you think?
  
   Maybe it's a side effect for the recent effort to fix another race in
   the probe.  A part of them problem is that the registration is done at
   the very last of probing.
  
   Instead of delaying the registration, how about the patch below?
  
   Ping.  If this really works, I'd like to queue it for 3.8 merge, at
   least...
 
  Ping ack; I was trying to find time to understand another race that
  occurs with GPU probing after switching, but is separate from the
  situation before switching, here.
 
  In the context of writing the switch, it looks like struct azx isn't
  allocated by the time azx_vs_set_state accesses it [1,2]; racing with
  azx_codec_create?
 
  It was allocated, but it wasn't assigned properly in pci drvdata.
 
  Below is the revised patch.  Just moved pci_set_drvdata() before
  register_vga_switcheroo().  Could you retest with it?
 
 Superb; this addresses the oops.

OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

 ~1 second after the DGPU is put into D3, I still often see hda-intel:
 spurious response 0x0:0x0, last cmd=0x170500:
 http://quora.org/2012/hda-switch-spurious.txt

Hm, it's not clear who triggers these messages.  I'll try to check the
code paths.

 Presumably this implies the read of the ring-buffer pointer returned
 0x, so the HDA driver understands the pointer to have wrapped
 and processes the 191 unwritten entries?

Good point.  Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP).  Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.

Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0x as invalid.

A test patch is below.  Hopefully this improves the situation...


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..ce990db 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val)
spin_lock_irq(chip-reg_lock);
 
/* add command to corb */
-   wp = azx_readb(chip, CORBWP);
+   wp = azx_readw(chip, CORBWP);
+   if (wp == 0x) {
+   /* something wrong, controller likely turned to D3 */
+   spin_unlock_irq(chip-reg_lock);
+   return -1;
+   }
wp++;
wp %= ICH6_MAX_CORB_ENTRIES;
 
chip-rirb.cmds[addr]++;
chip-corb.buf[wp] = cpu_to_le32(val);
-   azx_writel(chip, CORBWP, wp);
+   azx_writew(chip, CORBWP, wp);
 
spin_unlock_irq(chip-reg_lock);
 
@@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip)
unsigned int addr;
u32 res, res_ex;
 
-   wp = azx_readb(chip, RIRBWP);
+   wp = azx_readw(chip, RIRBWP);
+   if (wp == 0x) {
+   /* something wrong, controller likely turned to D3 */
+   return;
+   }
+
if (wp == chip-rirb.wp)
return;
chip-rirb.wp = wp;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Daniel J Blueman
On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
 At Mon, 3 Dec 2012 23:08:28 +0800,
 Daniel J Blueman wrote:

 On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 22:25:52 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
   At Wed, 28 Nov 2012 09:45:39 +0100,
   Takashi Iwai wrote:
  
   At Wed, 28 Nov 2012 11:45:07 +0800,
   Daniel J Blueman wrote:
   
Hi Seth, Dave, Takashi,
   
If I power down the unused discrete GPU before lightdm starts by
fiddling with the sysfs file [1] in the upstart script, I see a race
manifesting as the discrete GPU's HDA controller timing out to
commands [2].
   
Adding some debug, I see that the registered audio devices are put
into D3 before the GPU is, but it turns out that the discrete (and
internal) GPU's HDA controller gets registered a bit later, so the
list is empty. The symptom is since the HDA driver it's talking to
hardware which is now in D3.
   
We could add a mutex to nouveau to allow us to wait for the DGPU HDA
controller, but perhaps this should be solved at a higher level in 
the
vgaswitcheroo code; what do you think?
  
   Maybe it's a side effect for the recent effort to fix another race in
   the probe.  A part of them problem is that the registration is done at
   the very last of probing.
  
   Instead of delaying the registration, how about the patch below?
  
   Ping.  If this really works, I'd like to queue it for 3.8 merge, at
   least...
 
  Ping ack; I was trying to find time to understand another race that
  occurs with GPU probing after switching, but is separate from the
  situation before switching, here.
 
  In the context of writing the switch, it looks like struct azx isn't
  allocated by the time azx_vs_set_state accesses it [1,2]; racing with
  azx_codec_create?
 
  It was allocated, but it wasn't assigned properly in pci drvdata.
 
  Below is the revised patch.  Just moved pci_set_drvdata() before
  register_vga_switcheroo().  Could you retest with it?

 Superb; this addresses the oops.

 OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

 ~1 second after the DGPU is put into D3, I still often see hda-intel:
 spurious response 0x0:0x0, last cmd=0x170500:
 http://quora.org/2012/hda-switch-spurious.txt

 Hm, it's not clear who triggers these messages.  I'll try to check the
 code paths.

 Presumably this implies the read of the ring-buffer pointer returned
 0x, so the HDA driver understands the pointer to have wrapped
 and processes the 191 unwritten entries?

 Good point.  Actually there is one bug that looks obviously wrong
 (writing 32bit value to CORBWP).  Maybe it has been working just
 because writing CORBRP doesn't influence except for the reset bit.

 Reading CORBWP as a byte is OK, but this could be better in a word so
 that we can check 0x as invalid.

 A test patch is below.  Hopefully this improves the situation...

I'll check this out tomorrow and also instrument the code to get a
backtrace, since there may still be an underlying race with the
previous patches:

[8.203827] snd_hda_intel :00:1b.0: enabling device ( - 0002)
[8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
[   10.981297] VGA switcheroo: switched nouveau off
[   10.981383] nouveau  [ DRM] suspending fbcon...
[   10.981388] nouveau  [ DRM] suspending display...
[   10.981687] nouveau  [ DRM] unpinning framebuffer(s)...
[   10.981825] nouveau  [ DRM] evicting buffers...
[   10.992948] nouveau  [ DRM] suspending client object trees...
[   11.310697] hda-intel: azx_get_response timeout, switching to
polling mode: last cmd=0x000f
[   12.320236] hda-intel: No response from codec, disabling MSI: last
cmd=0x000f
[   13.329721] hda-intel: Codec #0 probe error; disabling it...
[   14.419163] hda_intel: azx_get_response timeout, switching to
single_cmd mode: last cmd=0x000f
[   14.419855] hda-intel: no codecs initialized
[   14.459150] snd_hda_intel :01:00.1: Refused to change power
state, currently in D3
[   14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client
[   14.459183] hda-intel: VGA controller for :01:00.1 is disabled
[   14.459184] hda-intel: Delaying initialization

Full log at: http://quora.org/2012/hda-switch-spurious2.txt

The first response timeout message plausibly could be from the CS4206
(0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio
devices after. Also, we see that checking for master abort in one
place won't be sufficient.

Stay tuned for backtraces tomorrow.

Thanks again so far!
  Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-12-03 Thread Takashi Iwai
At Tue, 4 Dec 2012 00:50:56 +0800,
Daniel J Blueman wrote:
 
 On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote:
  At Mon, 3 Dec 2012 23:08:28 +0800,
  Daniel J Blueman wrote:
 
  On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote:
   At Mon, 3 Dec 2012 22:25:52 +0800,
   Daniel J Blueman wrote:
  
   On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote:
At Wed, 28 Nov 2012 09:45:39 +0100,
Takashi Iwai wrote:
   
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:

 Hi Seth, Dave, Takashi,

 If I power down the unused discrete GPU before lightdm starts by
 fiddling with the sysfs file [1] in the upstart script, I see a 
 race
 manifesting as the discrete GPU's HDA controller timing out to
 commands [2].

 Adding some debug, I see that the registered audio devices are put
 into D3 before the GPU is, but it turns out that the discrete (and
 internal) GPU's HDA controller gets registered a bit later, so the
 list is empty. The symptom is since the HDA driver it's talking to
 hardware which is now in D3.

 We could add a mutex to nouveau to allow us to wait for the DGPU 
 HDA
 controller, but perhaps this should be solved at a higher level in 
 the
 vgaswitcheroo code; what do you think?
   
Maybe it's a side effect for the recent effort to fix another race in
the probe.  A part of them problem is that the registration is done 
at
the very last of probing.
   
Instead of delaying the registration, how about the patch below?
   
Ping.  If this really works, I'd like to queue it for 3.8 merge, at
least...
  
   Ping ack; I was trying to find time to understand another race that
   occurs with GPU probing after switching, but is separate from the
   situation before switching, here.
  
   In the context of writing the switch, it looks like struct azx isn't
   allocated by the time azx_vs_set_state accesses it [1,2]; racing with
   azx_codec_create?
  
   It was allocated, but it wasn't assigned properly in pci drvdata.
  
   Below is the revised patch.  Just moved pci_set_drvdata() before
   register_vga_switcheroo().  Could you retest with it?
 
  Superb; this addresses the oops.
 
  OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
 
  ~1 second after the DGPU is put into D3, I still often see hda-intel:
  spurious response 0x0:0x0, last cmd=0x170500:
  http://quora.org/2012/hda-switch-spurious.txt
 
  Hm, it's not clear who triggers these messages.  I'll try to check the
  code paths.
 
  Presumably this implies the read of the ring-buffer pointer returned
  0x, so the HDA driver understands the pointer to have wrapped
  and processes the 191 unwritten entries?
 
  Good point.  Actually there is one bug that looks obviously wrong
  (writing 32bit value to CORBWP).  Maybe it has been working just
  because writing CORBRP doesn't influence except for the reset bit.
 
  Reading CORBWP as a byte is OK, but this could be better in a word so
  that we can check 0x as invalid.
 
  A test patch is below.  Hopefully this improves the situation...
 
 I'll check this out tomorrow and also instrument the code to get a
 backtrace, since there may still be an underlying race with the
 previous patches:
 
 [8.203827] snd_hda_intel :00:1b.0: enabling device ( - 0002)
 [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
 [   10.981297] VGA switcheroo: switched nouveau off
 [   10.981383] nouveau  [ DRM] suspending fbcon...
 [   10.981388] nouveau  [ DRM] suspending display...
 [   10.981687] nouveau  [ DRM] unpinning framebuffer(s)...
 [   10.981825] nouveau  [ DRM] evicting buffers...
 [   10.992948] nouveau  [ DRM] suspending client object trees...
 [   11.310697] hda-intel: azx_get_response timeout, switching to
 polling mode: last cmd=0x000f
 [   12.320236] hda-intel: No response from codec, disabling MSI: last
 cmd=0x000f
 [   13.329721] hda-intel: Codec #0 probe error; disabling it...
 [   14.419163] hda_intel: azx_get_response timeout, switching to
 single_cmd mode: last cmd=0x000f
 [   14.419855] hda-intel: no codecs initialized
 [   14.459150] snd_hda_intel :01:00.1: Refused to change power
 state, currently in D3
 [   14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client
 [   14.459183] hda-intel: VGA controller for :01:00.1 is disabled
 [   14.459184] hda-intel: Delaying initialization
 
 Full log at: http://quora.org/2012/hda-switch-spurious2.txt
 
 The first response timeout message plausibly could be from the CS4206
 (0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio
 devices after.

That's odd.  The Cirrus one (:00:1b.0) must be independent from
the vga switcheroo things for Nvidia...

The patch below is the revised patch of the first one.  Now the vga
switcheroo registration is done before the video controller D3 check,
so the race 

Re: switcheroo registration vs switching race...

2012-11-28 Thread Takashi Iwai
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:
> 
> Hi Seth, Dave, Takashi,
> 
> If I power down the unused discrete GPU before lightdm starts by
> fiddling with the sysfs file [1] in the upstart script, I see a race
> manifesting as the discrete GPU's HDA controller timing out to
> commands [2].
> 
> Adding some debug, I see that the registered audio devices are put
> into D3 before the GPU is, but it turns out that the discrete (and
> internal) GPU's HDA controller gets registered a bit later, so the
> list is empty. The symptom is since the HDA driver it's talking to
> hardware which is now in D3.
> 
> We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> controller, but perhaps this should be solved at a higher level in the
> vgaswitcheroo code; what do you think?

Maybe it's a side effect for the recent effort to fix another race in
the probe.  A part of them problem is that the registration is done at
the very last of probing.

Instead of delaying the registration, how about the patch below?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..17fbd68 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86
 /* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
/* locks */
spinlock_t reg_lock;
struct mutex open_mutex;
+   struct completion probe_wait;
 
/* streams (x num_streams) */
struct azx_dev *azx_dev;
@@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
struct snd_card *card = pci_get_drvdata(pci);
struct azx *chip = card->private_data;
 
+   wait_for_completion(>probe_wait);
if (chip->init_failed)
return false;
if (chip->disabled || !chip->bus)
@@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
 
azx_notifier_unregister(chip);
 
+   chip->init_failed = 1; /* to be sure */
+   complete(>probe_wait);
+
if (use_vga_switcheroo(chip)) {
if (chip->disabled && chip->bus)
snd_hda_unlock_devices(chip->bus);
@@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
INIT_LIST_HEAD(>pcm_list);
INIT_LIST_HEAD(>list);
init_vga_switcheroo(chip);
+   init_completion(>probe_wait);
 
chip->position_fix[0] = chip->position_fix[1] =
check_position_fix(chip, position_fix[dev]);
@@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
}
 #endif /* CONFIG_SND_HDA_PATCH_LOADER */
 
+   err = register_vga_switcheroo(chip);
+   if (err < 0) {
+   snd_printk(KERN_ERR SFX
+  "Error registering VGA-switcheroo client\n");
+   goto out_free;
+   }
+
if (probe_now) {
err = azx_probe_continue(chip);
if (err < 0)
@@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(>dev);
 
-   err = register_vga_switcheroo(chip);
-   if (err < 0) {
-   snd_printk(KERN_ERR SFX
-  "Error registering VGA-switcheroo client\n");
-   goto out_free;
-   }
-
dev++;
+   complete(>probe_wait);
return 0;
 
 out_free:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: switcheroo registration vs switching race...

2012-11-28 Thread Takashi Iwai
At Wed, 28 Nov 2012 11:45:07 +0800,
Daniel J Blueman wrote:
 
 Hi Seth, Dave, Takashi,
 
 If I power down the unused discrete GPU before lightdm starts by
 fiddling with the sysfs file [1] in the upstart script, I see a race
 manifesting as the discrete GPU's HDA controller timing out to
 commands [2].
 
 Adding some debug, I see that the registered audio devices are put
 into D3 before the GPU is, but it turns out that the discrete (and
 internal) GPU's HDA controller gets registered a bit later, so the
 list is empty. The symptom is since the HDA driver it's talking to
 hardware which is now in D3.
 
 We could add a mutex to nouveau to allow us to wait for the DGPU HDA
 controller, but perhaps this should be solved at a higher level in the
 vgaswitcheroo code; what do you think?

Maybe it's a side effect for the recent effort to fix another race in
the probe.  A part of them problem is that the registration is done at
the very last of probing.

Instead of delaying the registration, how about the patch below?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..17fbd68 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
 #include linux/pm_runtime.h
 #include linux/clocksource.h
 #include linux/time.h
+#include linux/completion.h
 
 #ifdef CONFIG_X86
 /* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
/* locks */
spinlock_t reg_lock;
struct mutex open_mutex;
+   struct completion probe_wait;
 
/* streams (x num_streams) */
struct azx_dev *azx_dev;
@@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
struct snd_card *card = pci_get_drvdata(pci);
struct azx *chip = card-private_data;
 
+   wait_for_completion(chip-probe_wait);
if (chip-init_failed)
return false;
if (chip-disabled || !chip-bus)
@@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip)
 
azx_notifier_unregister(chip);
 
+   chip-init_failed = 1; /* to be sure */
+   complete(chip-probe_wait);
+
if (use_vga_switcheroo(chip)) {
if (chip-disabled  chip-bus)
snd_hda_unlock_devices(chip-bus);
@@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
INIT_LIST_HEAD(chip-pcm_list);
INIT_LIST_HEAD(chip-list);
init_vga_switcheroo(chip);
+   init_completion(chip-probe_wait);
 
chip-position_fix[0] = chip-position_fix[1] =
check_position_fix(chip, position_fix[dev]);
@@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
}
 #endif /* CONFIG_SND_HDA_PATCH_LOADER */
 
+   err = register_vga_switcheroo(chip);
+   if (err  0) {
+   snd_printk(KERN_ERR SFX
+  Error registering VGA-switcheroo client\n);
+   goto out_free;
+   }
+
if (probe_now) {
err = azx_probe_continue(chip);
if (err  0)
@@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci,
if (pci_dev_run_wake(pci))
pm_runtime_put_noidle(pci-dev);
 
-   err = register_vga_switcheroo(chip);
-   if (err  0) {
-   snd_printk(KERN_ERR SFX
-  Error registering VGA-switcheroo client\n);
-   goto out_free;
-   }
-
dev++;
+   complete(chip-probe_wait);
return 0;
 
 out_free:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


switcheroo registration vs switching race...

2012-11-27 Thread Daniel J Blueman
Hi Seth, Dave, Takashi,

If I power down the unused discrete GPU before lightdm starts by
fiddling with the sysfs file [1] in the upstart script, I see a race
manifesting as the discrete GPU's HDA controller timing out to
commands [2].

Adding some debug, I see that the registered audio devices are put
into D3 before the GPU is, but it turns out that the discrete (and
internal) GPU's HDA controller gets registered a bit later, so the
list is empty. The symptom is since the HDA driver it's talking to
hardware which is now in D3.

We could add a mutex to nouveau to allow us to wait for the DGPU HDA
controller, but perhaps this should be solved at a higher level in the
vgaswitcheroo code; what do you think?

Thanks,
  Daniel

--- [1]

echo OFF >/sys/kernel/debug/vgaswitcheroo/switch

--- [2]

snd_hda_intel :00:1b.0: enabling device ( -> 0002)
snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
input: HDA Intel PCH Mic as /devices/pci:00/:00:1b.0/sound/card0/input10
input: HDA Intel PCH Headphone as
/devices/pci:00/:00:1b.0/sound/card0/input11
nouveau [  VBIOS][:01:00.0] ... appears to be valid
nouveau [  VBIOS][:01:00.0] using image from PRAMIN
nouveau [  VBIOS][:01:00.0] BIT signature found
nouveau [  VBIOS][:01:00.0] version 80.07.26.04
nouveau [   MXM][:01:00.0] no VBIOS data, nothing to do
nouveau [   PFB][:01:00.0] RAM type: GDDR5
nouveau [   PFB][:01:00.0] RAM size: 1024 MiB
nouveau W[ PGRAPH][:01:00.0] disabled, PGRAPH=1 to enable
vga_switcheroo: enabled
[TTM] Zone kernel: Available graphics memory: 4076308 kiB
[TTM] Zone  dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau [   DRM] VRAM: 1024 MiB
nouveau [   DRM] GART: 512 MiB
nouveau [   DRM] BIT BIOS found
nouveau [   DRM] Bios version 80.07.26.04
nouveau [   DRM] TMDS table version 2.0
nouveau [   DRM] DCB version 4.0
nouveau [   DRM] DCB outp 00: 048101b6 0f230010
nouveau [   DRM] DCB outp 01: 018212d6 0f220020
nouveau [   DRM] DCB outp 02: 01021212 00020020
nouveau [   DRM] DCB outp 03: 088324c6 0f220010
nouveau [   DRM] DCB outp 04: 08032402 00020010
nouveau [   DRM] DCB outp 05: 02843862 00020010
nouveau [   DRM] DCB conn 00: 00020047
nouveau [   DRM] DCB conn 01: 02208146
nouveau [   DRM] DCB conn 02: 01104246
nouveau [   DRM] DCB conn 03: 00410361
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] No driver support for vblank timestamp query.
nouveau W[   DRM] voltage table 0x50 unknown
nouveau [   DRM] 4 available performance level(s)
nouveau [   DRM] 1: core 209MHz shader 419MHz memory 405MHz voltage 520mV
nouveau [   DRM] 2: core 390MHz shader 780MHz memory 1080MHz voltage 610mV
nouveau [   DRM] 3: core 1000MHz shader 2000MHz memory 1080MHz voltage 630mV
nouveau [   DRM] 4: core 1254MHz shader 2508MHz memory 1080MHz voltage 630mV
nouveau [   DRM] c:
nouveau E[   DRM] failed to create kernel channel, -22
No connectors reported connected with modes
[drm] Cannot find any crtc or sizes - going 1024x768
nouveau [   DRM] allocated 1024x768 fb: 0x6, bo 880264974400
fb1: nouveaufb frame buffer device
[drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 1
snd_hda_intel :01:00.1: enabling device ( -> 0002)
hda-intel: :01:00.1: Handle VGA-switcheroo audio client
snd_hda_intel :01:00.1: irq 52 for MSI/MSI-X
VGA switcheroo: switched nouveau off
nouveau [   DRM] suspending fbcon...
nouveau [   DRM] suspending display...
nouveau [   DRM] unpinning framebuffer(s)...
nouveau [   DRM] evicting buffers...
nouveau [   DRM] suspending client object trees...
input: HDA NVidia HDMI/DP,pcm=8 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input12
input: HDA NVidia HDMI/DP,pcm=7 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input13
input: HDA NVidia HDMI/DP,pcm=3 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input14
nouveau E[   I2C][:01:00.0] AUXCH(3): begin idle timeout 0x
nouveau E[   I2C][:01:00.0] AUXCH(2): begin idle timeout 0x
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503

switcheroo registration vs switching race...

2012-11-27 Thread Daniel J Blueman
Hi Seth, Dave, Takashi,

If I power down the unused discrete GPU before lightdm starts by
fiddling with the sysfs file [1] in the upstart script, I see a race
manifesting as the discrete GPU's HDA controller timing out to
commands [2].

Adding some debug, I see that the registered audio devices are put
into D3 before the GPU is, but it turns out that the discrete (and
internal) GPU's HDA controller gets registered a bit later, so the
list is empty. The symptom is since the HDA driver it's talking to
hardware which is now in D3.

We could add a mutex to nouveau to allow us to wait for the DGPU HDA
controller, but perhaps this should be solved at a higher level in the
vgaswitcheroo code; what do you think?

Thanks,
  Daniel

--- [1]

echo OFF /sys/kernel/debug/vgaswitcheroo/switch

--- [2]

snd_hda_intel :00:1b.0: enabling device ( - 0002)
snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X
input: HDA Intel PCH Mic as /devices/pci:00/:00:1b.0/sound/card0/input10
input: HDA Intel PCH Headphone as
/devices/pci:00/:00:1b.0/sound/card0/input11
nouveau [  VBIOS][:01:00.0] ... appears to be valid
nouveau [  VBIOS][:01:00.0] using image from PRAMIN
nouveau [  VBIOS][:01:00.0] BIT signature found
nouveau [  VBIOS][:01:00.0] version 80.07.26.04
nouveau [   MXM][:01:00.0] no VBIOS data, nothing to do
nouveau [   PFB][:01:00.0] RAM type: GDDR5
nouveau [   PFB][:01:00.0] RAM size: 1024 MiB
nouveau W[ PGRAPH][:01:00.0] disabled, PGRAPH=1 to enable
vga_switcheroo: enabled
[TTM] Zone kernel: Available graphics memory: 4076308 kiB
[TTM] Zone  dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau [   DRM] VRAM: 1024 MiB
nouveau [   DRM] GART: 512 MiB
nouveau [   DRM] BIT BIOS found
nouveau [   DRM] Bios version 80.07.26.04
nouveau [   DRM] TMDS table version 2.0
nouveau [   DRM] DCB version 4.0
nouveau [   DRM] DCB outp 00: 048101b6 0f230010
nouveau [   DRM] DCB outp 01: 018212d6 0f220020
nouveau [   DRM] DCB outp 02: 01021212 00020020
nouveau [   DRM] DCB outp 03: 088324c6 0f220010
nouveau [   DRM] DCB outp 04: 08032402 00020010
nouveau [   DRM] DCB outp 05: 02843862 00020010
nouveau [   DRM] DCB conn 00: 00020047
nouveau [   DRM] DCB conn 01: 02208146
nouveau [   DRM] DCB conn 02: 01104246
nouveau [   DRM] DCB conn 03: 00410361
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] No driver support for vblank timestamp query.
nouveau W[   DRM] voltage table 0x50 unknown
nouveau [   DRM] 4 available performance level(s)
nouveau [   DRM] 1: core 209MHz shader 419MHz memory 405MHz voltage 520mV
nouveau [   DRM] 2: core 390MHz shader 780MHz memory 1080MHz voltage 610mV
nouveau [   DRM] 3: core 1000MHz shader 2000MHz memory 1080MHz voltage 630mV
nouveau [   DRM] 4: core 1254MHz shader 2508MHz memory 1080MHz voltage 630mV
nouveau [   DRM] c:
nouveau E[   DRM] failed to create kernel channel, -22
No connectors reported connected with modes
[drm] Cannot find any crtc or sizes - going 1024x768
nouveau [   DRM] allocated 1024x768 fb: 0x6, bo 880264974400
fb1: nouveaufb frame buffer device
[drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 1
snd_hda_intel :01:00.1: enabling device ( - 0002)
hda-intel: :01:00.1: Handle VGA-switcheroo audio client
snd_hda_intel :01:00.1: irq 52 for MSI/MSI-X
VGA switcheroo: switched nouveau off
nouveau [   DRM] suspending fbcon...
nouveau [   DRM] suspending display...
nouveau [   DRM] unpinning framebuffer(s)...
nouveau [   DRM] evicting buffers...
nouveau [   DRM] suspending client object trees...
input: HDA NVidia HDMI/DP,pcm=8 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input12
input: HDA NVidia HDMI/DP,pcm=7 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input13
input: HDA NVidia HDMI/DP,pcm=3 as
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input14
nouveau E[   I2C][:01:00.0] AUXCH(3): begin idle timeout 0x
nouveau E[   I2C][:01:00.0] AUXCH(2): begin idle timeout 0x
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: spurious response 0x0:0x0, last cmd=0x170503
hda-intel: