Re: switcheroo registration vs switching race...
On 5 December 2012 00:04, Takashi Iwai wrote: > At Tue, 4 Dec 2012 23:54:39 +0800, > Daniel J Blueman wrote: >> >> On 4 December 2012 23:03, Takashi Iwai wrote: >> > At Tue, 4 Dec 2012 22:46:47 +0800, >> > Daniel J Blueman wrote: >> >> >> >> On 4 December 2012 21:55, Takashi Iwai wrote: >> >> > At Tue, 04 Dec 2012 14:23:05 +0100, >> >> > Takashi Iwai wrote: >> >> >> >> >> >> At Tue, 4 Dec 2012 20:58:55 +0800, >> >> >> Daniel J Blueman wrote: >> >> >> > >> >> >> > On 4 December 2012 01:10, Takashi Iwai wrote: >> >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800, >> >> >> > > Daniel J Blueman wrote: >> >> >> > >> >> >> >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: >> >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, >> >> >> > >> > Daniel J Blueman wrote: >> >> >> > >> >> >> >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: >> >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, >> >> >> > >> >> > Daniel J Blueman wrote: >> >> >> > >> >> >> >> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai >> >> >> > >> >> >> wrote: >> >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> >> >> > >> >> >> > Takashi Iwai wrote: >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> >> >> > >> >> >> >> Daniel J Blueman wrote: >> >> >> > >> >> >> >> > >> >> >> > >> >> >> >> > Hi Seth, Dave, Takashi, >> >> >> > >> >> >> >> > >> >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm >> >> >> > >> >> >> >> > starts by >> >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart >> >> >> > >> >> >> >> > script, I see a race >> >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller >> >> >> > >> >> >> >> > timing out to >> >> >> > >> >> >> >> > commands [2]. >> >> >> > >> >> >> >> > >> >> >> > >> >> >> >> > Adding some debug, I see that the registered audio >> >> >> > >> >> >> >> > devices are put >> >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the >> >> >> > >> >> >> >> > discrete (and >> >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit >> >> >> > >> >> >> >> > later, so the >> >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver >> >> >> > >> >> >> >> > it's talking to >> >> >> > >> >> >> >> > hardware which is now in D3. >> >> >> > >> >> >> >> > >> >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait >> >> >> > >> >> >> >> > for the DGPU HDA >> >> >> > >> >> >> >> > controller, but perhaps this should be solved at a >> >> >> > >> >> >> >> > higher level in the >> >> >> > >> >> >> >> > vgaswitcheroo code; what do you think? >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix >> >> >> > >> >> >> >> another race in >> >> >> > >> >> >> >> the probe. A part of them problem is that the >> >> >> > >> >> >> >> registration is done at >> >> >> > >> >> >> >> the very last of probing. >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> Instead of delaying the registration, how about the >> >> >> > >> >> >> >> patch below? >> >> >> > >> >> >> > >> >> >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 >> >> >> > >> >> >> > merge, at >> >> >> > >> >> >> > least... >> >> >> > >> >> >> >> >> >> > >> >> >> Ping ack; I was trying to find time to understand another >> >> >> > >> >> >> race that >> >> >> > >> >> >> occurs with GPU probing after switching, but is separate >> >> >> > >> >> >> from the >> >> >> > >> >> >> situation before switching, here. >> >> >> > >> >> >> >> >> >> > >> >> >> In the context of writing the switch, it looks like struct >> >> >> > >> >> >> azx isn't >> >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; >> >> >> > >> >> >> racing with >> >> >> > >> >> >> azx_codec_create? >> >> >> > >> >> > >> >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci >> >> >> > >> >> > drvdata. >> >> >> > >> >> > >> >> >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() >> >> >> > >> >> > before >> >> >> > >> >> > register_vga_switcheroo(). Could you retest with it? >> >> >> > >> >> >> >> >> > >> >> Superb; this addresses the oops. >> >> >> > >> > >> >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to >> >> >> > >> > stable. >> >> >> > >> > >> >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see >> >> >> > >> >> "hda-intel: >> >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": >> >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt >> >> >> > >> > >> >> >> > >> > Hm, it's not clear who triggers these messages. I'll try to >> >> >> > >> > check the >> >> >> > >> > code paths. >> >> >> > >> > >> >> >> > >> >> Presumably this implies the read of the ring-buffer pointer >> >> >> > >> >> returned >> >> >> > >> >> 0x, so the HDA driver understands the pointer to have >> >> >> > >> >> wrapped >> >> >> > >> >> and processes the 191
Re: switcheroo registration vs switching race...
On 5 December 2012 00:04, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 23:54:39 +0800, Daniel J Blueman wrote: On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 22:46:47 +0800, Daniel J Blueman wrote: On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote: At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 23:54:39 +0800, Daniel J Blueman wrote: > > On 4 December 2012 23:03, Takashi Iwai wrote: > > At Tue, 4 Dec 2012 22:46:47 +0800, > > Daniel J Blueman wrote: > >> > >> On 4 December 2012 21:55, Takashi Iwai wrote: > >> > At Tue, 04 Dec 2012 14:23:05 +0100, > >> > Takashi Iwai wrote: > >> >> > >> >> At Tue, 4 Dec 2012 20:58:55 +0800, > >> >> Daniel J Blueman wrote: > >> >> > > >> >> > On 4 December 2012 01:10, Takashi Iwai wrote: > >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800, > >> >> > > Daniel J Blueman wrote: > >> >> > >> > >> >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: > >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > >> >> > >> > Daniel J Blueman wrote: > >> >> > >> >> > >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: > >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > >> >> > >> >> > Daniel J Blueman wrote: > >> >> > >> >> >> > >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> >> > >> >> >> > Takashi Iwai wrote: > >> >> > >> >> >> >> > >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> >> > >> >> >> >> Daniel J Blueman wrote: > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > Hi Seth, Dave, Takashi, > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm > >> >> > >> >> >> >> > starts by > >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, > >> >> > >> >> >> >> > I see a race > >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing > >> >> > >> >> >> >> > out to > >> >> > >> >> >> >> > commands [2]. > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > Adding some debug, I see that the registered audio > >> >> > >> >> >> >> > devices are put > >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the > >> >> > >> >> >> >> > discrete (and > >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit > >> >> > >> >> >> >> > later, so the > >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's > >> >> > >> >> >> >> > talking to > >> >> > >> >> >> >> > hardware which is now in D3. > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for > >> >> > >> >> >> >> > the DGPU HDA > >> >> > >> >> >> >> > controller, but perhaps this should be solved at a > >> >> > >> >> >> >> > higher level in the > >> >> > >> >> >> >> > vgaswitcheroo code; what do you think? > >> >> > >> >> >> >> > >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix > >> >> > >> >> >> >> another race in > >> >> > >> >> >> >> the probe. A part of them problem is that the > >> >> > >> >> >> >> registration is done at > >> >> > >> >> >> >> the very last of probing. > >> >> > >> >> >> >> > >> >> > >> >> >> >> Instead of delaying the registration, how about the patch > >> >> > >> >> >> >> below? > >> >> > >> >> >> > > >> >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 > >> >> > >> >> >> > merge, at > >> >> > >> >> >> > least... > >> >> > >> >> >> > >> >> > >> >> >> Ping ack; I was trying to find time to understand another > >> >> > >> >> >> race that > >> >> > >> >> >> occurs with GPU probing after switching, but is separate > >> >> > >> >> >> from the > >> >> > >> >> >> situation before switching, here. > >> >> > >> >> >> > >> >> > >> >> >> In the context of writing the switch, it looks like struct > >> >> > >> >> >> azx isn't > >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; > >> >> > >> >> >> racing with > >> >> > >> >> >> azx_codec_create? > >> >> > >> >> > > >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci > >> >> > >> >> > drvdata. > >> >> > >> >> > > >> >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() > >> >> > >> >> > before > >> >> > >> >> > register_vga_switcheroo(). Could you retest with it? > >> >> > >> >> > >> >> > >> >> Superb; this addresses the oops. > >> >> > >> > > >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > >> >> > >> > > >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see > >> >> > >> >> "hda-intel: > >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": > >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt > >> >> > >> > > >> >> > >> > Hm, it's not clear who triggers these messages. I'll try to > >> >> > >> > check the > >> >> > >> > code paths. > >> >> > >> > > >> >> > >> >> Presumably this implies the read of the ring-buffer pointer > >> >> > >> >> returned > >> >> > >> >> 0x, so the HDA driver understands the pointer to have > >> >> > >> >> wrapped > >> >> > >> >> and processes the 191 unwritten entries? > >> >> > >> > > >> >> > >> > Good point. Actually there is one bug that looks obviously wrong > >> >> > >> > (writing 32bit value to CORBWP). Maybe it has been working just > >>
Re: switcheroo registration vs switching race...
On 4 December 2012 23:03, Takashi Iwai wrote: > At Tue, 4 Dec 2012 22:46:47 +0800, > Daniel J Blueman wrote: >> >> On 4 December 2012 21:55, Takashi Iwai wrote: >> > At Tue, 04 Dec 2012 14:23:05 +0100, >> > Takashi Iwai wrote: >> >> >> >> At Tue, 4 Dec 2012 20:58:55 +0800, >> >> Daniel J Blueman wrote: >> >> > >> >> > On 4 December 2012 01:10, Takashi Iwai wrote: >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800, >> >> > > Daniel J Blueman wrote: >> >> > >> >> >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, >> >> > >> > Daniel J Blueman wrote: >> >> > >> >> >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, >> >> > >> >> > Daniel J Blueman wrote: >> >> > >> >> >> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> >> > >> >> >> > Takashi Iwai wrote: >> >> > >> >> >> >> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> >> > >> >> >> >> Daniel J Blueman wrote: >> >> > >> >> >> >> > >> >> > >> >> >> >> > Hi Seth, Dave, Takashi, >> >> > >> >> >> >> > >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm >> >> > >> >> >> >> > starts by >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I >> >> > >> >> >> >> > see a race >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing >> >> > >> >> >> >> > out to >> >> > >> >> >> >> > commands [2]. >> >> > >> >> >> >> > >> >> > >> >> >> >> > Adding some debug, I see that the registered audio >> >> > >> >> >> >> > devices are put >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the >> >> > >> >> >> >> > discrete (and >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit >> >> > >> >> >> >> > later, so the >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's >> >> > >> >> >> >> > talking to >> >> > >> >> >> >> > hardware which is now in D3. >> >> > >> >> >> >> > >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for >> >> > >> >> >> >> > the DGPU HDA >> >> > >> >> >> >> > controller, but perhaps this should be solved at a higher >> >> > >> >> >> >> > level in the >> >> > >> >> >> >> > vgaswitcheroo code; what do you think? >> >> > >> >> >> >> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix >> >> > >> >> >> >> another race in >> >> > >> >> >> >> the probe. A part of them problem is that the registration >> >> > >> >> >> >> is done at >> >> > >> >> >> >> the very last of probing. >> >> > >> >> >> >> >> >> > >> >> >> >> Instead of delaying the registration, how about the patch >> >> > >> >> >> >> below? >> >> > >> >> >> > >> >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 >> >> > >> >> >> > merge, at >> >> > >> >> >> > least... >> >> > >> >> >> >> >> > >> >> >> Ping ack; I was trying to find time to understand another race >> >> > >> >> >> that >> >> > >> >> >> occurs with GPU probing after switching, but is separate from >> >> > >> >> >> the >> >> > >> >> >> situation before switching, here. >> >> > >> >> >> >> >> > >> >> >> In the context of writing the switch, it looks like struct azx >> >> > >> >> >> isn't >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; >> >> > >> >> >> racing with >> >> > >> >> >> azx_codec_create? >> >> > >> >> > >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci >> >> > >> >> > drvdata. >> >> > >> >> > >> >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before >> >> > >> >> > register_vga_switcheroo(). Could you retest with it? >> >> > >> >> >> >> > >> >> Superb; this addresses the oops. >> >> > >> > >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. >> >> > >> > >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see >> >> > >> >> "hda-intel: >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt >> >> > >> > >> >> > >> > Hm, it's not clear who triggers these messages. I'll try to check >> >> > >> > the >> >> > >> > code paths. >> >> > >> > >> >> > >> >> Presumably this implies the read of the ring-buffer pointer >> >> > >> >> returned >> >> > >> >> 0x, so the HDA driver understands the pointer to have >> >> > >> >> wrapped >> >> > >> >> and processes the 191 unwritten entries? >> >> > >> > >> >> > >> > Good point. Actually there is one bug that looks obviously wrong >> >> > >> > (writing 32bit value to CORBWP). Maybe it has been working just >> >> > >> > because writing CORBRP doesn't influence except for the reset bit. >> >> > >> > >> >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word >> >> > >> > so >> >> > >> > that we can check 0x as invalid. >> >> > >> > >> >> > >> > A test patch is below. Hopefully
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 22:46:47 +0800, Daniel J Blueman wrote: > > On 4 December 2012 21:55, Takashi Iwai wrote: > > At Tue, 04 Dec 2012 14:23:05 +0100, > > Takashi Iwai wrote: > >> > >> At Tue, 4 Dec 2012 20:58:55 +0800, > >> Daniel J Blueman wrote: > >> > > >> > On 4 December 2012 01:10, Takashi Iwai wrote: > >> > > At Tue, 4 Dec 2012 00:50:56 +0800, > >> > > Daniel J Blueman wrote: > >> > >> > >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: > >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > >> > >> > Daniel J Blueman wrote: > >> > >> >> > >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: > >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > >> > >> >> > Daniel J Blueman wrote: > >> > >> >> >> > >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> > >> >> >> > Takashi Iwai wrote: > >> > >> >> >> >> > >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> > >> >> >> >> Daniel J Blueman wrote: > >> > >> >> >> >> > > >> > >> >> >> >> > Hi Seth, Dave, Takashi, > >> > >> >> >> >> > > >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm > >> > >> >> >> >> > starts by > >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I > >> > >> >> >> >> > see a race > >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing > >> > >> >> >> >> > out to > >> > >> >> >> >> > commands [2]. > >> > >> >> >> >> > > >> > >> >> >> >> > Adding some debug, I see that the registered audio devices > >> > >> >> >> >> > are put > >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the > >> > >> >> >> >> > discrete (and > >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit > >> > >> >> >> >> > later, so the > >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's > >> > >> >> >> >> > talking to > >> > >> >> >> >> > hardware which is now in D3. > >> > >> >> >> >> > > >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for > >> > >> >> >> >> > the DGPU HDA > >> > >> >> >> >> > controller, but perhaps this should be solved at a higher > >> > >> >> >> >> > level in the > >> > >> >> >> >> > vgaswitcheroo code; what do you think? > >> > >> >> >> >> > >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix > >> > >> >> >> >> another race in > >> > >> >> >> >> the probe. A part of them problem is that the registration > >> > >> >> >> >> is done at > >> > >> >> >> >> the very last of probing. > >> > >> >> >> >> > >> > >> >> >> >> Instead of delaying the registration, how about the patch > >> > >> >> >> >> below? > >> > >> >> >> > > >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 > >> > >> >> >> > merge, at > >> > >> >> >> > least... > >> > >> >> >> > >> > >> >> >> Ping ack; I was trying to find time to understand another race > >> > >> >> >> that > >> > >> >> >> occurs with GPU probing after switching, but is separate from > >> > >> >> >> the > >> > >> >> >> situation before switching, here. > >> > >> >> >> > >> > >> >> >> In the context of writing the switch, it looks like struct azx > >> > >> >> >> isn't > >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; > >> > >> >> >> racing with > >> > >> >> >> azx_codec_create? > >> > >> >> > > >> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. > >> > >> >> > > >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before > >> > >> >> > register_vga_switcheroo(). Could you retest with it? > >> > >> >> > >> > >> >> Superb; this addresses the oops. > >> > >> > > >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > >> > >> > > >> > >> >> ~1 second after the DGPU is put into D3, I still often see > >> > >> >> "hda-intel: > >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": > >> > >> >> http://quora.org/2012/hda-switch-spurious.txt > >> > >> > > >> > >> > Hm, it's not clear who triggers these messages. I'll try to check > >> > >> > the > >> > >> > code paths. > >> > >> > > >> > >> >> Presumably this implies the read of the ring-buffer pointer > >> > >> >> returned > >> > >> >> 0x, so the HDA driver understands the pointer to have > >> > >> >> wrapped > >> > >> >> and processes the 191 unwritten entries? > >> > >> > > >> > >> > Good point. Actually there is one bug that looks obviously wrong > >> > >> > (writing 32bit value to CORBWP). Maybe it has been working just > >> > >> > because writing CORBRP doesn't influence except for the reset bit. > >> > >> > > >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word > >> > >> > so > >> > >> > that we can check 0x as invalid. > >> > >> > > >> > >> > A test patch is below. Hopefully this improves the situation... > >> > >> > >> > >> I'll check this out tomorrow and also instrument the code to get a > >> > >> backtrace, since there may still be an underlying
Re: switcheroo registration vs switching race...
On 4 December 2012 21:55, Takashi Iwai wrote: > At Tue, 04 Dec 2012 14:23:05 +0100, > Takashi Iwai wrote: >> >> At Tue, 4 Dec 2012 20:58:55 +0800, >> Daniel J Blueman wrote: >> > >> > On 4 December 2012 01:10, Takashi Iwai wrote: >> > > At Tue, 4 Dec 2012 00:50:56 +0800, >> > > Daniel J Blueman wrote: >> > >> >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, >> > >> > Daniel J Blueman wrote: >> > >> >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, >> > >> >> > Daniel J Blueman wrote: >> > >> >> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> > >> >> >> > Takashi Iwai wrote: >> > >> >> >> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> > >> >> >> >> Daniel J Blueman wrote: >> > >> >> >> >> > >> > >> >> >> >> > Hi Seth, Dave, Takashi, >> > >> >> >> >> > >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm >> > >> >> >> >> > starts by >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I >> > >> >> >> >> > see a race >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out >> > >> >> >> >> > to >> > >> >> >> >> > commands [2]. >> > >> >> >> >> > >> > >> >> >> >> > Adding some debug, I see that the registered audio devices >> > >> >> >> >> > are put >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the >> > >> >> >> >> > discrete (and >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, >> > >> >> >> >> > so the >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's >> > >> >> >> >> > talking to >> > >> >> >> >> > hardware which is now in D3. >> > >> >> >> >> > >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the >> > >> >> >> >> > DGPU HDA >> > >> >> >> >> > controller, but perhaps this should be solved at a higher >> > >> >> >> >> > level in the >> > >> >> >> >> > vgaswitcheroo code; what do you think? >> > >> >> >> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix another >> > >> >> >> >> race in >> > >> >> >> >> the probe. A part of them problem is that the registration is >> > >> >> >> >> done at >> > >> >> >> >> the very last of probing. >> > >> >> >> >> >> > >> >> >> >> Instead of delaying the registration, how about the patch >> > >> >> >> >> below? >> > >> >> >> > >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 >> > >> >> >> > merge, at >> > >> >> >> > least... >> > >> >> >> >> > >> >> >> Ping ack; I was trying to find time to understand another race >> > >> >> >> that >> > >> >> >> occurs with GPU probing after switching, but is separate from the >> > >> >> >> situation before switching, here. >> > >> >> >> >> > >> >> >> In the context of writing the switch, it looks like struct azx >> > >> >> >> isn't >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing >> > >> >> >> with >> > >> >> >> azx_codec_create? >> > >> >> > >> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. >> > >> >> > >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before >> > >> >> > register_vga_switcheroo(). Could you retest with it? >> > >> >> >> > >> >> Superb; this addresses the oops. >> > >> > >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. >> > >> > >> > >> >> ~1 second after the DGPU is put into D3, I still often see >> > >> >> "hda-intel: >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": >> > >> >> http://quora.org/2012/hda-switch-spurious.txt >> > >> > >> > >> > Hm, it's not clear who triggers these messages. I'll try to check the >> > >> > code paths. >> > >> > >> > >> >> Presumably this implies the read of the ring-buffer pointer returned >> > >> >> 0x, so the HDA driver understands the pointer to have wrapped >> > >> >> and processes the 191 unwritten entries? >> > >> > >> > >> > Good point. Actually there is one bug that looks obviously wrong >> > >> > (writing 32bit value to CORBWP). Maybe it has been working just >> > >> > because writing CORBRP doesn't influence except for the reset bit. >> > >> > >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word so >> > >> > that we can check 0x as invalid. >> > >> > >> > >> > A test patch is below. Hopefully this improves the situation... >> > >> >> > >> I'll check this out tomorrow and also instrument the code to get a >> > >> backtrace, since there may still be an underlying race with the >> > >> previous patches: >> > >> > [...] >> > >> > > That's odd. The Cirrus one (:00:1b.0) must be independent from >> > > the vga switcheroo things for Nvidia... >> > > >> > > The patch below is the revised patch of the first one. Now the vga >> > > switcheroo registration is done before the video controller D3 check, >> > >
Re: switcheroo registration vs switching race...
At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: > > At Tue, 4 Dec 2012 20:58:55 +0800, > Daniel J Blueman wrote: > > > > On 4 December 2012 01:10, Takashi Iwai wrote: > > > At Tue, 4 Dec 2012 00:50:56 +0800, > > > Daniel J Blueman wrote: > > >> > > >> On 4 December 2012 00:23, Takashi Iwai wrote: > > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > > >> > Daniel J Blueman wrote: > > >> >> > > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: > > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > > >> >> > Daniel J Blueman wrote: > > >> >> >> > > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > > >> >> >> > Takashi Iwai wrote: > > >> >> >> >> > > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > > >> >> >> >> Daniel J Blueman wrote: > > >> >> >> >> > > > >> >> >> >> > Hi Seth, Dave, Takashi, > > >> >> >> >> > > > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts > > >> >> >> >> > by > > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see > > >> >> >> >> > a race > > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to > > >> >> >> >> > commands [2]. > > >> >> >> >> > > > >> >> >> >> > Adding some debug, I see that the registered audio devices > > >> >> >> >> > are put > > >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete > > >> >> >> >> > (and > > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, > > >> >> >> >> > so the > > >> >> >> >> > list is empty. The symptom is since the HDA driver it's > > >> >> >> >> > talking to > > >> >> >> >> > hardware which is now in D3. > > >> >> >> >> > > > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the > > >> >> >> >> > DGPU HDA > > >> >> >> >> > controller, but perhaps this should be solved at a higher > > >> >> >> >> > level in the > > >> >> >> >> > vgaswitcheroo code; what do you think? > > >> >> >> >> > > >> >> >> >> Maybe it's a side effect for the recent effort to fix another > > >> >> >> >> race in > > >> >> >> >> the probe. A part of them problem is that the registration is > > >> >> >> >> done at > > >> >> >> >> the very last of probing. > > >> >> >> >> > > >> >> >> >> Instead of delaying the registration, how about the patch below? > > >> >> >> > > > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, > > >> >> >> > at > > >> >> >> > least... > > >> >> >> > > >> >> >> Ping ack; I was trying to find time to understand another race that > > >> >> >> occurs with GPU probing after switching, but is separate from the > > >> >> >> situation before switching, here. > > >> >> >> > > >> >> >> In the context of writing the switch, it looks like struct azx > > >> >> >> isn't > > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing > > >> >> >> with > > >> >> >> azx_codec_create? > > >> >> > > > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. > > >> >> > > > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before > > >> >> > register_vga_switcheroo(). Could you retest with it? > > >> >> > > >> >> Superb; this addresses the oops. > > >> > > > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > > >> > > > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: > > >> >> spurious response 0x0:0x0, last cmd=0x170500": > > >> >> http://quora.org/2012/hda-switch-spurious.txt > > >> > > > >> > Hm, it's not clear who triggers these messages. I'll try to check the > > >> > code paths. > > >> > > > >> >> Presumably this implies the read of the ring-buffer pointer returned > > >> >> 0x, so the HDA driver understands the pointer to have wrapped > > >> >> and processes the 191 unwritten entries? > > >> > > > >> > Good point. Actually there is one bug that looks obviously wrong > > >> > (writing 32bit value to CORBWP). Maybe it has been working just > > >> > because writing CORBRP doesn't influence except for the reset bit. > > >> > > > >> > Reading CORBWP as a byte is OK, but this could be better in a word so > > >> > that we can check 0x as invalid. > > >> > > > >> > A test patch is below. Hopefully this improves the situation... > > >> > > >> I'll check this out tomorrow and also instrument the code to get a > > >> backtrace, since there may still be an underlying race with the > > >> previous patches: > > > > [...] > > > > > That's odd. The Cirrus one (:00:1b.0) must be independent from > > > the vga switcheroo things for Nvidia... > > > > > > The patch below is the revised patch of the first one. Now the vga > > > switcheroo registration is done before the video controller D3 check, > > > so the race should be smaller. > > > > This patch improves things further of course; a major improvement over > > without. ~15% of the time, I still get the 'spurious response' > > messages in this callpath: > > A possible
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: > > On 4 December 2012 01:10, Takashi Iwai wrote: > > At Tue, 4 Dec 2012 00:50:56 +0800, > > Daniel J Blueman wrote: > >> > >> On 4 December 2012 00:23, Takashi Iwai wrote: > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > >> > Daniel J Blueman wrote: > >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > >> >> > Daniel J Blueman wrote: > >> >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> >> >> > Takashi Iwai wrote: > >> >> >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> >> >> >> Daniel J Blueman wrote: > >> >> >> >> > > >> >> >> >> > Hi Seth, Dave, Takashi, > >> >> >> >> > > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts by > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a > >> >> >> >> > race > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to > >> >> >> >> > commands [2]. > >> >> >> >> > > >> >> >> >> > Adding some debug, I see that the registered audio devices are > >> >> >> >> > put > >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete > >> >> >> >> > (and > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so > >> >> >> >> > the > >> >> >> >> > list is empty. The symptom is since the HDA driver it's talking > >> >> >> >> > to > >> >> >> >> > hardware which is now in D3. > >> >> >> >> > > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the > >> >> >> >> > DGPU HDA > >> >> >> >> > controller, but perhaps this should be solved at a higher level > >> >> >> >> > in the > >> >> >> >> > vgaswitcheroo code; what do you think? > >> >> >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix another > >> >> >> >> race in > >> >> >> >> the probe. A part of them problem is that the registration is > >> >> >> >> done at > >> >> >> >> the very last of probing. > >> >> >> >> > >> >> >> >> Instead of delaying the registration, how about the patch below? > >> >> >> > > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at > >> >> >> > least... > >> >> >> > >> >> >> Ping ack; I was trying to find time to understand another race that > >> >> >> occurs with GPU probing after switching, but is separate from the > >> >> >> situation before switching, here. > >> >> >> > >> >> >> In the context of writing the switch, it looks like struct azx isn't > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with > >> >> >> azx_codec_create? > >> >> > > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. > >> >> > > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before > >> >> > register_vga_switcheroo(). Could you retest with it? > >> >> > >> >> Superb; this addresses the oops. > >> > > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > >> > > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: > >> >> spurious response 0x0:0x0, last cmd=0x170500": > >> >> http://quora.org/2012/hda-switch-spurious.txt > >> > > >> > Hm, it's not clear who triggers these messages. I'll try to check the > >> > code paths. > >> > > >> >> Presumably this implies the read of the ring-buffer pointer returned > >> >> 0x, so the HDA driver understands the pointer to have wrapped > >> >> and processes the 191 unwritten entries? > >> > > >> > Good point. Actually there is one bug that looks obviously wrong > >> > (writing 32bit value to CORBWP). Maybe it has been working just > >> > because writing CORBRP doesn't influence except for the reset bit. > >> > > >> > Reading CORBWP as a byte is OK, but this could be better in a word so > >> > that we can check 0x as invalid. > >> > > >> > A test patch is below. Hopefully this improves the situation... > >> > >> I'll check this out tomorrow and also instrument the code to get a > >> backtrace, since there may still be an underlying race with the > >> previous patches: > > [...] > > > That's odd. The Cirrus one (:00:1b.0) must be independent from > > the vga switcheroo things for Nvidia... > > > > The patch below is the revised patch of the first one. Now the vga > > switcheroo registration is done before the video controller D3 check, > > so the race should be smaller. > > This patch improves things further of course; a major improvement over > without. ~15% of the time, I still get the 'spurious response' > messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... > > Pid:
Re: switcheroo registration vs switching race...
On 4 December 2012 01:10, Takashi Iwai wrote: > At Tue, 4 Dec 2012 00:50:56 +0800, > Daniel J Blueman wrote: >> >> On 4 December 2012 00:23, Takashi Iwai wrote: >> > At Mon, 3 Dec 2012 23:08:28 +0800, >> > Daniel J Blueman wrote: >> >> >> >> On 3 December 2012 22:40, Takashi Iwai wrote: >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, >> >> > Daniel J Blueman wrote: >> >> >> >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> >> >> > Takashi Iwai wrote: >> >> >> >> >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> >> >> >> Daniel J Blueman wrote: >> >> >> >> > >> >> >> >> > Hi Seth, Dave, Takashi, >> >> >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts by >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a >> >> >> >> > race >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to >> >> >> >> > commands [2]. >> >> >> >> > >> >> >> >> > Adding some debug, I see that the registered audio devices are put >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the >> >> >> >> > list is empty. The symptom is since the HDA driver it's talking to >> >> >> >> > hardware which is now in D3. >> >> >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU >> >> >> >> > HDA >> >> >> >> > controller, but perhaps this should be solved at a higher level >> >> >> >> > in the >> >> >> >> > vgaswitcheroo code; what do you think? >> >> >> >> >> >> >> >> Maybe it's a side effect for the recent effort to fix another race >> >> >> >> in >> >> >> >> the probe. A part of them problem is that the registration is done >> >> >> >> at >> >> >> >> the very last of probing. >> >> >> >> >> >> >> >> Instead of delaying the registration, how about the patch below? >> >> >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at >> >> >> > least... >> >> >> >> >> >> Ping ack; I was trying to find time to understand another race that >> >> >> occurs with GPU probing after switching, but is separate from the >> >> >> situation before switching, here. >> >> >> >> >> >> In the context of writing the switch, it looks like struct azx isn't >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with >> >> >> azx_codec_create? >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. >> >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before >> >> > register_vga_switcheroo(). Could you retest with it? >> >> >> >> Superb; this addresses the oops. >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. >> > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: >> >> spurious response 0x0:0x0, last cmd=0x170500": >> >> http://quora.org/2012/hda-switch-spurious.txt >> > >> > Hm, it's not clear who triggers these messages. I'll try to check the >> > code paths. >> > >> >> Presumably this implies the read of the ring-buffer pointer returned >> >> 0x, so the HDA driver understands the pointer to have wrapped >> >> and processes the 191 unwritten entries? >> > >> > Good point. Actually there is one bug that looks obviously wrong >> > (writing 32bit value to CORBWP). Maybe it has been working just >> > because writing CORBRP doesn't influence except for the reset bit. >> > >> > Reading CORBWP as a byte is OK, but this could be better in a word so >> > that we can check 0x as invalid. >> > >> > A test patch is below. Hopefully this improves the situation... >> >> I'll check this out tomorrow and also instrument the code to get a >> backtrace, since there may still be an underlying race with the >> previous patches: [...] > That's odd. The Cirrus one (:00:1b.0) must be independent from > the vga switcheroo things for Nvidia... > > The patch below is the revised patch of the first one. Now the vga > switcheroo registration is done before the video controller D3 check, > so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15 Call Trace: [] azx_update_rirb+0xe6/0xf0 [snd_hda_intel] [] azx_get_response+0xc0/0x270 [snd_hda_intel] [] codec_exec_verb+0xc1/0x100 [snd_hda_codec] [] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec] [] ? try_to_grab_pending+0xb9/0x130 [] hda_set_power_state+0xd5/0x110 [snd_hda_codec] [] hda_call_codec_resume+0x20/0x120 [snd_hda_codec] [] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec] [] codec_exec_verb+0x4f/0x100 [snd_hda_codec] [] snd_hda_codec_write+0x6b/0xa0 [snd_hda_codec] [] snd_hda_codec_write_cache+0x31/0xa0 [snd_hda_codec] [] snd_hda_jack_detect_enable_callback+0x75/0xa0 [snd_hda_codec] []
Re: switcheroo registration vs switching race...
On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15 Call Trace: [a0025776] azx_update_rirb+0xe6/0xf0 [snd_hda_intel] [a0026d00] azx_get_response+0xc0/0x270 [snd_hda_intel] [a0226801] codec_exec_verb+0xc1/0x100 [snd_hda_codec] [a022799b] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec] [810545e9] ? try_to_grab_pending+0xb9/0x130 [a0227d35] hda_set_power_state+0xd5/0x110 [snd_hda_codec] [a0226420] hda_call_codec_resume+0x20/0x120 [snd_hda_codec] [a0226689] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec] [a022678f] codec_exec_verb+0x4f/0x100 [snd_hda_codec] [a02268ab] snd_hda_codec_write+0x6b/0xa0 [snd_hda_codec] [a0227281] snd_hda_codec_write_cache+0x31/0xa0 [snd_hda_codec] [a022b965] snd_hda_jack_detect_enable_callback+0x75/0xa0 [snd_hda_codec] [a022b9a1] snd_hda_jack_detect_enable+0x11/0x20 [snd_hda_codec] [a00cf5a6] generic_hdmi_init+0xb6/0xe0 [snd_hda_codec_hdmi] [a022ad67] snd_hda_codec_build_controls+0x37/0x130 [snd_hda_codec] [a022b080] ? snd_hda_codec_build_pcms+0x1a0/0x2e0
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... Pid: 473, comm: modprobe Not tainted 3.7.0-rc8-expert+ #15 Call Trace: [a0025776] azx_update_rirb+0xe6/0xf0 [snd_hda_intel] [a0026d00] azx_get_response+0xc0/0x270 [snd_hda_intel] [a0226801] codec_exec_verb+0xc1/0x100 [snd_hda_codec] [a022799b] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec] [810545e9] ? try_to_grab_pending+0xb9/0x130 [a0227d35] hda_set_power_state+0xd5/0x110 [snd_hda_codec] [a0226420] hda_call_codec_resume+0x20/0x120 [snd_hda_codec] [a0226689] snd_hda_power_save+0x129/0x1e0 [snd_hda_codec] [a022678f]
Re: switcheroo registration vs switching race...
At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... I think I understand this path. You are setting OFF, right? This will set the audio client OFF before can_switch() is called. Thus it can be called even before the probing process finished. In short, wait_for_completion() must be put at the beginning of azx_vs_set_state() in addition to azx_vs_can_switch(). The revised patch is attached below. Hopefully this sorts out all races. Takashi --- diff --git
Re: switcheroo registration vs switching race...
On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote: At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... I think I understand this path. You are setting OFF, right? This will set the audio client OFF before can_switch() is called. Thus it can be called even before the probing process finished. In short, wait_for_completion() must be put at the beginning of azx_vs_set_state() in addition to azx_vs_can_switch(). The revised
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 22:46:47 +0800, Daniel J Blueman wrote: On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote: At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... I think I understand this path. You are setting OFF, right? This will set the audio client OFF before
Re: switcheroo registration vs switching race...
On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 22:46:47 +0800, Daniel J Blueman wrote: On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote: At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is disabled. There is a code check_hdmi_disabled() in hda_intel.c that checks the availability of the video driver, and it might be that this doesn't work as expected... I think I understand this path. You are
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 23:54:39 +0800, Daniel J Blueman wrote: On 4 December 2012 23:03, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 22:46:47 +0800, Daniel J Blueman wrote: On 4 December 2012 21:55, Takashi Iwai ti...@suse.de wrote: At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: At Tue, 4 Dec 2012 20:58:55 +0800, Daniel J Blueman wrote: On 4 December 2012 01:10, Takashi Iwai ti...@suse.de wrote: At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [...] That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race should be smaller. This patch improves things further of course; a major improvement over without. ~15% of the time, I still get the 'spurious response' messages in this callpath: A possible scenario is that the graphics went in D3 before probing hd-audio, and the hd-audio continues to initialize the hardware without knowing the graphics counterpart is
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: > > On 4 December 2012 00:23, Takashi Iwai wrote: > > At Mon, 3 Dec 2012 23:08:28 +0800, > > Daniel J Blueman wrote: > >> > >> On 3 December 2012 22:40, Takashi Iwai wrote: > >> > At Mon, 3 Dec 2012 22:25:52 +0800, > >> > Daniel J Blueman wrote: > >> >> > >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> >> > Takashi Iwai wrote: > >> >> >> > >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> >> >> Daniel J Blueman wrote: > >> >> >> > > >> >> >> > Hi Seth, Dave, Takashi, > >> >> >> > > >> >> >> > If I power down the unused discrete GPU before lightdm starts by > >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a > >> >> >> > race > >> >> >> > manifesting as the discrete GPU's HDA controller timing out to > >> >> >> > commands [2]. > >> >> >> > > >> >> >> > Adding some debug, I see that the registered audio devices are put > >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and > >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the > >> >> >> > list is empty. The symptom is since the HDA driver it's talking to > >> >> >> > hardware which is now in D3. > >> >> >> > > >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU > >> >> >> > HDA > >> >> >> > controller, but perhaps this should be solved at a higher level in > >> >> >> > the > >> >> >> > vgaswitcheroo code; what do you think? > >> >> >> > >> >> >> Maybe it's a side effect for the recent effort to fix another race in > >> >> >> the probe. A part of them problem is that the registration is done > >> >> >> at > >> >> >> the very last of probing. > >> >> >> > >> >> >> Instead of delaying the registration, how about the patch below? > >> >> > > >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at > >> >> > least... > >> >> > >> >> Ping ack; I was trying to find time to understand another race that > >> >> occurs with GPU probing after switching, but is separate from the > >> >> situation before switching, here. > >> >> > >> >> In the context of writing the switch, it looks like struct azx isn't > >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with > >> >> azx_codec_create? > >> > > >> > It was allocated, but it wasn't assigned properly in pci drvdata. > >> > > >> > Below is the revised patch. Just moved pci_set_drvdata() before > >> > register_vga_switcheroo(). Could you retest with it? > >> > >> Superb; this addresses the oops. > > > > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > > > >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: > >> spurious response 0x0:0x0, last cmd=0x170500": > >> http://quora.org/2012/hda-switch-spurious.txt > > > > Hm, it's not clear who triggers these messages. I'll try to check the > > code paths. > > > >> Presumably this implies the read of the ring-buffer pointer returned > >> 0x, so the HDA driver understands the pointer to have wrapped > >> and processes the 191 unwritten entries? > > > > Good point. Actually there is one bug that looks obviously wrong > > (writing 32bit value to CORBWP). Maybe it has been working just > > because writing CORBRP doesn't influence except for the reset bit. > > > > Reading CORBWP as a byte is OK, but this could be better in a word so > > that we can check 0x as invalid. > > > > A test patch is below. Hopefully this improves the situation... > > I'll check this out tomorrow and also instrument the code to get a > backtrace, since there may still be an underlying race with the > previous patches: > > [8.203827] snd_hda_intel :00:1b.0: enabling device ( -> 0002) > [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X > [ 10.981297] VGA switcheroo: switched nouveau off > [ 10.981383] nouveau [ DRM] suspending fbcon... > [ 10.981388] nouveau [ DRM] suspending display... > [ 10.981687] nouveau [ DRM] unpinning framebuffer(s)... > [ 10.981825] nouveau [ DRM] evicting buffers... > [ 10.992948] nouveau [ DRM] suspending client object trees... > [ 11.310697] hda-intel: azx_get_response timeout, switching to > polling mode: last cmd=0x000f > [ 12.320236] hda-intel: No response from codec, disabling MSI: last > cmd=0x000f > [ 13.329721] hda-intel: Codec #0 probe error; disabling it... > [ 14.419163] hda_intel: azx_get_response timeout, switching to > single_cmd mode: last cmd=0x000f > [ 14.419855] hda-intel: no codecs initialized > [ 14.459150] snd_hda_intel :01:00.1: Refused to change power > state, currently in D3 > [ 14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client > [ 14.459183] hda-intel: VGA controller for :01:00.1 is disabled > [ 14.459184] hda-intel: Delaying initialization > > Full log at: http://quora.org/2012/hda-switch-spurious2.txt > > The first
Re: switcheroo registration vs switching race...
On 4 December 2012 00:23, Takashi Iwai wrote: > At Mon, 3 Dec 2012 23:08:28 +0800, > Daniel J Blueman wrote: >> >> On 3 December 2012 22:40, Takashi Iwai wrote: >> > At Mon, 3 Dec 2012 22:25:52 +0800, >> > Daniel J Blueman wrote: >> >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> >> > Takashi Iwai wrote: >> >> >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> >> >> Daniel J Blueman wrote: >> >> >> > >> >> >> > Hi Seth, Dave, Takashi, >> >> >> > >> >> >> > If I power down the unused discrete GPU before lightdm starts by >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race >> >> >> > manifesting as the discrete GPU's HDA controller timing out to >> >> >> > commands [2]. >> >> >> > >> >> >> > Adding some debug, I see that the registered audio devices are put >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the >> >> >> > list is empty. The symptom is since the HDA driver it's talking to >> >> >> > hardware which is now in D3. >> >> >> > >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA >> >> >> > controller, but perhaps this should be solved at a higher level in >> >> >> > the >> >> >> > vgaswitcheroo code; what do you think? >> >> >> >> >> >> Maybe it's a side effect for the recent effort to fix another race in >> >> >> the probe. A part of them problem is that the registration is done at >> >> >> the very last of probing. >> >> >> >> >> >> Instead of delaying the registration, how about the patch below? >> >> > >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at >> >> > least... >> >> >> >> Ping ack; I was trying to find time to understand another race that >> >> occurs with GPU probing after switching, but is separate from the >> >> situation before switching, here. >> >> >> >> In the context of writing the switch, it looks like struct azx isn't >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with >> >> azx_codec_create? >> > >> > It was allocated, but it wasn't assigned properly in pci drvdata. >> > >> > Below is the revised patch. Just moved pci_set_drvdata() before >> > register_vga_switcheroo(). Could you retest with it? >> >> Superb; this addresses the oops. > > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: >> spurious response 0x0:0x0, last cmd=0x170500": >> http://quora.org/2012/hda-switch-spurious.txt > > Hm, it's not clear who triggers these messages. I'll try to check the > code paths. > >> Presumably this implies the read of the ring-buffer pointer returned >> 0x, so the HDA driver understands the pointer to have wrapped >> and processes the 191 unwritten entries? > > Good point. Actually there is one bug that looks obviously wrong > (writing 32bit value to CORBWP). Maybe it has been working just > because writing CORBRP doesn't influence except for the reset bit. > > Reading CORBWP as a byte is OK, but this could be better in a word so > that we can check 0x as invalid. > > A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [8.203827] snd_hda_intel :00:1b.0: enabling device ( -> 0002) [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X [ 10.981297] VGA switcheroo: switched nouveau off [ 10.981383] nouveau [ DRM] suspending fbcon... [ 10.981388] nouveau [ DRM] suspending display... [ 10.981687] nouveau [ DRM] unpinning framebuffer(s)... [ 10.981825] nouveau [ DRM] evicting buffers... [ 10.992948] nouveau [ DRM] suspending client object trees... [ 11.310697] hda-intel: azx_get_response timeout, switching to polling mode: last cmd=0x000f [ 12.320236] hda-intel: No response from codec, disabling MSI: last cmd=0x000f [ 13.329721] hda-intel: Codec #0 probe error; disabling it... [ 14.419163] hda_intel: azx_get_response timeout, switching to single_cmd mode: last cmd=0x000f [ 14.419855] hda-intel: no codecs initialized [ 14.459150] snd_hda_intel :01:00.1: Refused to change power state, currently in D3 [ 14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client [ 14.459183] hda-intel: VGA controller for :01:00.1 is disabled [ 14.459184] hda-intel: Delaying initialization Full log at: http://quora.org/2012/hda-switch-spurious2.txt The first response timeout message plausibly could be from the CS4206 (0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio devices after. Also, we see that checking for master abort in one place won't be sufficient. Stay tuned for backtraces tomorrow. Thanks again so far! Daniel -- Daniel J Blueman -- To
Re: switcheroo registration vs switching race...
At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: > > On 3 December 2012 22:40, Takashi Iwai wrote: > > At Mon, 3 Dec 2012 22:25:52 +0800, > > Daniel J Blueman wrote: > >> > >> On 3 December 2012 19:17, Takashi Iwai wrote: > >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> > Takashi Iwai wrote: > >> >> > >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> >> Daniel J Blueman wrote: > >> >> > > >> >> > Hi Seth, Dave, Takashi, > >> >> > > >> >> > If I power down the unused discrete GPU before lightdm starts by > >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race > >> >> > manifesting as the discrete GPU's HDA controller timing out to > >> >> > commands [2]. > >> >> > > >> >> > Adding some debug, I see that the registered audio devices are put > >> >> > into D3 before the GPU is, but it turns out that the discrete (and > >> >> > internal) GPU's HDA controller gets registered a bit later, so the > >> >> > list is empty. The symptom is since the HDA driver it's talking to > >> >> > hardware which is now in D3. > >> >> > > >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA > >> >> > controller, but perhaps this should be solved at a higher level in the > >> >> > vgaswitcheroo code; what do you think? > >> >> > >> >> Maybe it's a side effect for the recent effort to fix another race in > >> >> the probe. A part of them problem is that the registration is done at > >> >> the very last of probing. > >> >> > >> >> Instead of delaying the registration, how about the patch below? > >> > > >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at > >> > least... > >> > >> Ping ack; I was trying to find time to understand another race that > >> occurs with GPU probing after switching, but is separate from the > >> situation before switching, here. > >> > >> In the context of writing the switch, it looks like struct azx isn't > >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with > >> azx_codec_create? > > > > It was allocated, but it wasn't assigned properly in pci drvdata. > > > > Below is the revised patch. Just moved pci_set_drvdata() before > > register_vga_switcheroo(). Could you retest with it? > > Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > ~1 second after the DGPU is put into D3, I still often see "hda-intel: > spurious response 0x0:0x0, last cmd=0x170500": > http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. > Presumably this implies the read of the ring-buffer pointer returned > 0x, so the HDA driver understands the pointer to have wrapped > and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... thanks, Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..ce990db 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val) spin_lock_irq(>reg_lock); /* add command to corb */ - wp = azx_readb(chip, CORBWP); + wp = azx_readw(chip, CORBWP); + if (wp == 0x) { + /* something wrong, controller likely turned to D3 */ + spin_unlock_irq(>reg_lock); + return -1; + } wp++; wp %= ICH6_MAX_CORB_ENTRIES; chip->rirb.cmds[addr]++; chip->corb.buf[wp] = cpu_to_le32(val); - azx_writel(chip, CORBWP, wp); + azx_writew(chip, CORBWP, wp); spin_unlock_irq(>reg_lock); @@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip) unsigned int addr; u32 res, res_ex; - wp = azx_readb(chip, RIRBWP); + wp = azx_readw(chip, RIRBWP); + if (wp == 0x) { + /* something wrong, controller likely turned to D3 */ + return; + } + if (wp == chip->rirb.wp) return; chip->rirb.wp = wp; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
On 3 December 2012 22:40, Takashi Iwai wrote: > At Mon, 3 Dec 2012 22:25:52 +0800, > Daniel J Blueman wrote: >> >> On 3 December 2012 19:17, Takashi Iwai wrote: >> > At Wed, 28 Nov 2012 09:45:39 +0100, >> > Takashi Iwai wrote: >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> >> Daniel J Blueman wrote: >> >> > >> >> > Hi Seth, Dave, Takashi, >> >> > >> >> > If I power down the unused discrete GPU before lightdm starts by >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race >> >> > manifesting as the discrete GPU's HDA controller timing out to >> >> > commands [2]. >> >> > >> >> > Adding some debug, I see that the registered audio devices are put >> >> > into D3 before the GPU is, but it turns out that the discrete (and >> >> > internal) GPU's HDA controller gets registered a bit later, so the >> >> > list is empty. The symptom is since the HDA driver it's talking to >> >> > hardware which is now in D3. >> >> > >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA >> >> > controller, but perhaps this should be solved at a higher level in the >> >> > vgaswitcheroo code; what do you think? >> >> >> >> Maybe it's a side effect for the recent effort to fix another race in >> >> the probe. A part of them problem is that the registration is done at >> >> the very last of probing. >> >> >> >> Instead of delaying the registration, how about the patch below? >> > >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at >> > least... >> >> Ping ack; I was trying to find time to understand another race that >> occurs with GPU probing after switching, but is separate from the >> situation before switching, here. >> >> In the context of writing the switch, it looks like struct azx isn't >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with >> azx_codec_create? > > It was allocated, but it wasn't assigned properly in pci drvdata. > > Below is the revised patch. Just moved pci_set_drvdata() before > register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. ~1 second after the DGPU is put into D3, I still often see "hda-intel: spurious response 0x0:0x0, last cmd=0x170500": http://quora.org/2012/hda-switch-spurious.txt Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: > > On 3 December 2012 19:17, Takashi Iwai wrote: > > At Wed, 28 Nov 2012 09:45:39 +0100, > > Takashi Iwai wrote: > >> > >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> Daniel J Blueman wrote: > >> > > >> > Hi Seth, Dave, Takashi, > >> > > >> > If I power down the unused discrete GPU before lightdm starts by > >> > fiddling with the sysfs file [1] in the upstart script, I see a race > >> > manifesting as the discrete GPU's HDA controller timing out to > >> > commands [2]. > >> > > >> > Adding some debug, I see that the registered audio devices are put > >> > into D3 before the GPU is, but it turns out that the discrete (and > >> > internal) GPU's HDA controller gets registered a bit later, so the > >> > list is empty. The symptom is since the HDA driver it's talking to > >> > hardware which is now in D3. > >> > > >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA > >> > controller, but perhaps this should be solved at a higher level in the > >> > vgaswitcheroo code; what do you think? > >> > >> Maybe it's a side effect for the recent effort to fix another race in > >> the probe. A part of them problem is that the registration is done at > >> the very last of probing. > >> > >> Instead of delaying the registration, how about the patch below? > > > > Ping. If this really works, I'd like to queue it for 3.8 merge, at > > least... > > Ping ack; I was trying to find time to understand another race that > occurs with GPU probing after switching, but is separate from the > situation before switching, here. > > In the context of writing the switch, it looks like struct azx isn't > allocated by the time azx_vs_set_state accesses it [1,2]; racing with > azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? thanks, Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..b7c482a 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include #include #include +#include #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card->private_data; + wait_for_completion(>probe_wait); if (chip->init_failed) return false; if (chip->disabled || !chip->bus) @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip->init_failed = 1; /* to be sure */ + complete(>probe_wait); + if (use_vga_switcheroo(chip)) { if (chip->disabled && chip->bus) snd_hda_unlock_devices(chip->bus); @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(>pcm_list); INIT_LIST_HEAD(>list); init_vga_switcheroo(chip); + init_completion(>probe_wait); chip->position_fix[0] = chip->position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3462,17 +3469,8 @@ static int __devinit azx_probe(struct pci_dev *pci, } #endif /* CONFIG_SND_HDA_PATCH_LOADER */ - if (probe_now) { - err = azx_probe_continue(chip); - if (err < 0) - goto out_free; - } - pci_set_drvdata(pci, card); - if (pci_dev_run_wake(pci)) - pm_runtime_put_noidle(>dev); - err = register_vga_switcheroo(chip); if (err < 0) { snd_printk(KERN_ERR SFX @@ -3480,11 +3478,22 @@ static int __devinit azx_probe(struct pci_dev *pci, goto out_free; } + if (probe_now) { + err = azx_probe_continue(chip); + if (err < 0) + goto out_free; + } + + if (pci_dev_run_wake(pci)) + pm_runtime_put_noidle(>dev); + dev++; + complete(>probe_wait); return 0; out_free: snd_card_free(card); + pci_set_drvdata(pci, NULL); return err; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
On 3 December 2012 19:17, Takashi Iwai wrote: > At Wed, 28 Nov 2012 09:45:39 +0100, > Takashi Iwai wrote: >> >> At Wed, 28 Nov 2012 11:45:07 +0800, >> Daniel J Blueman wrote: >> > >> > Hi Seth, Dave, Takashi, >> > >> > If I power down the unused discrete GPU before lightdm starts by >> > fiddling with the sysfs file [1] in the upstart script, I see a race >> > manifesting as the discrete GPU's HDA controller timing out to >> > commands [2]. >> > >> > Adding some debug, I see that the registered audio devices are put >> > into D3 before the GPU is, but it turns out that the discrete (and >> > internal) GPU's HDA controller gets registered a bit later, so the >> > list is empty. The symptom is since the HDA driver it's talking to >> > hardware which is now in D3. >> > >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA >> > controller, but perhaps this should be solved at a higher level in the >> > vgaswitcheroo code; what do you think? >> >> Maybe it's a side effect for the recent effort to fix another race in >> the probe. A part of them problem is that the registration is done at >> the very last of probing. >> >> Instead of delaying the registration, how about the patch below? > > Ping. If this really works, I'd like to queue it for 3.8 merge, at > least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? The full dmesg output is at: http://quora.org/2012/hda-switch-oops.txt Thanks, Daniel --- [1] BUG: unable to handle kernel NULL pointer dereference at 0170 IP: [] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] PGD 26323d067 PUD 264f58067 PMD 0 Oops: [#1] SMP Modules linked in: snd_hda_codec_hdmi snd_hda_codec_cirrus rfcomm bnep nls_iso8859_1 joydev hid_apple bcm5974 nouveau coretemp kvm_intel b43 kvm uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops ghash_clmulni_intel smsc75xx usbnet mii ttm snd_hda_intel(+) snd_hda_codec snd_hwdep ssb i915 snd_pcm mxm_wmi snd_timer apple_gmux applesmc mei lpc_ich microcode hwmon mfd_core input_polldev bcma snd drm_kms_helper snd_page_alloc video apple_bl sdhci_pci sdhci mmc_core CPU 1 Pid: 967, comm: sh Not tainted 3.7.0-rc7-expert+ #8 Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F RIP: 0010:[] [] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] RSP: 0018:88025198de48 EFLAGS: 00010286 RAX: RBX: 880251960a00 RCX: RDX: RSI: RDI: 880265b41098 RBP: 88025198de68 R08: 0003 R09: 1000 R10: 7fffe481b730 R11: 0246 R12: 880265b41098 R13: R14: 88025198df50 R15: FS: 7f4961480700() GS:88026f24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0170 CR3: 000263cd3000 CR4: 001407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process sh (pid: 967, threadinfo 88025198c000, task 88025d635820) Stack: 88025d635820 880251960a00 88025198de98 88025198de88 812b8e77 880263ef1740 0004 88025198def8 812b947c 88020a46464f 81107982 Call Trace: [] set_audio_state+0x67/0x70 [] vga_switcheroo_debugfs_write+0xbc/0x380 [] ? __alloc_fd+0x42/0x110 [] ? __fd_install+0x29/0x60 [] vfs_write+0xa3/0x160 [] sys_write+0x4d/0xa0 [] ? do_page_fault+0x9/0x10 [] system_call_fastpath+0x1a/0x1f Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98 00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 fa a4 0d e1 <48> 8b 98 70 01 00 00 0f b6 83 dd 01 00 00 a8 10 75 34 45 85 ed RIP [] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] RSP CR2: 0170 --- [2] $ gdb ./sound/pci/hda/snd-hda-intel.ko (gdb) list *(azx_vs_set_state+0x26) 0x3036 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2628). 2623 2624static void azx_vs_set_state(struct pci_dev *pci, 2625 enum vga_switcheroo_state state) 2626{ 2627struct snd_card *card = pci_get_drvdata(pci); 2628struct azx *chip = card->private_data; 2629bool disabled; 2630 2631if (chip->init_failed) 2632return; -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: > > At Wed, 28 Nov 2012 11:45:07 +0800, > Daniel J Blueman wrote: > > > > Hi Seth, Dave, Takashi, > > > > If I power down the unused discrete GPU before lightdm starts by > > fiddling with the sysfs file [1] in the upstart script, I see a race > > manifesting as the discrete GPU's HDA controller timing out to > > commands [2]. > > > > Adding some debug, I see that the registered audio devices are put > > into D3 before the GPU is, but it turns out that the discrete (and > > internal) GPU's HDA controller gets registered a bit later, so the > > list is empty. The symptom is since the HDA driver it's talking to > > hardware which is now in D3. > > > > We could add a mutex to nouveau to allow us to wait for the DGPU HDA > > controller, but perhaps this should be solved at a higher level in the > > vgaswitcheroo code; what do you think? > > Maybe it's a side effect for the recent effort to fix another race in > the probe. A part of them problem is that the registration is done at > the very last of probing. > > Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... thanks, Takashi > > > Takashi > > --- > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > index 4bb52da..17fbd68 100644 > --- a/sound/pci/hda/hda_intel.c > +++ b/sound/pci/hda/hda_intel.c > @@ -49,6 +49,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_X86 > /* for snoop control */ > @@ -469,6 +470,7 @@ struct azx { > /* locks */ > spinlock_t reg_lock; > struct mutex open_mutex; > + struct completion probe_wait; > > /* streams (x num_streams) */ > struct azx_dev *azx_dev; > @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) > struct snd_card *card = pci_get_drvdata(pci); > struct azx *chip = card->private_data; > > + wait_for_completion(>probe_wait); > if (chip->init_failed) > return false; > if (chip->disabled || !chip->bus) > @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) > > azx_notifier_unregister(chip); > > + chip->init_failed = 1; /* to be sure */ > + complete(>probe_wait); > + > if (use_vga_switcheroo(chip)) { > if (chip->disabled && chip->bus) > snd_hda_unlock_devices(chip->bus); > @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, > struct pci_dev *pci, > INIT_LIST_HEAD(>pcm_list); > INIT_LIST_HEAD(>list); > init_vga_switcheroo(chip); > + init_completion(>probe_wait); > > chip->position_fix[0] = chip->position_fix[1] = > check_position_fix(chip, position_fix[dev]); > @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci, > } > #endif /* CONFIG_SND_HDA_PATCH_LOADER */ > > + err = register_vga_switcheroo(chip); > + if (err < 0) { > + snd_printk(KERN_ERR SFX > +"Error registering VGA-switcheroo client\n"); > + goto out_free; > + } > + > if (probe_now) { > err = azx_probe_continue(chip); > if (err < 0) > @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci, > if (pci_dev_run_wake(pci)) > pm_runtime_put_noidle(>dev); > > - err = register_vga_switcheroo(chip); > - if (err < 0) { > - snd_printk(KERN_ERR SFX > -"Error registering VGA-switcheroo client\n"); > - goto out_free; > - } > - > dev++; > + complete(>probe_wait); > return 0; > > out_free: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... thanks, Takashi Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..17fbd68 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include linux/pm_runtime.h #include linux/clocksource.h #include linux/time.h +#include linux/completion.h #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card-private_data; + wait_for_completion(chip-probe_wait); if (chip-init_failed) return false; if (chip-disabled || !chip-bus) @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip-init_failed = 1; /* to be sure */ + complete(chip-probe_wait); + if (use_vga_switcheroo(chip)) { if (chip-disabled chip-bus) snd_hda_unlock_devices(chip-bus); @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(chip-pcm_list); INIT_LIST_HEAD(chip-list); init_vga_switcheroo(chip); + init_completion(chip-probe_wait); chip-position_fix[0] = chip-position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci, } #endif /* CONFIG_SND_HDA_PATCH_LOADER */ + err = register_vga_switcheroo(chip); + if (err 0) { + snd_printk(KERN_ERR SFX +Error registering VGA-switcheroo client\n); + goto out_free; + } + if (probe_now) { err = azx_probe_continue(chip); if (err 0) @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci, if (pci_dev_run_wake(pci)) pm_runtime_put_noidle(pci-dev); - err = register_vga_switcheroo(chip); - if (err 0) { - snd_printk(KERN_ERR SFX -Error registering VGA-switcheroo client\n); - goto out_free; - } - dev++; + complete(chip-probe_wait); return 0; out_free: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? The full dmesg output is at: http://quora.org/2012/hda-switch-oops.txt Thanks, Daniel --- [1] BUG: unable to handle kernel NULL pointer dereference at 0170 IP: [a01e4006] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] PGD 26323d067 PUD 264f58067 PMD 0 Oops: [#1] SMP Modules linked in: snd_hda_codec_hdmi snd_hda_codec_cirrus rfcomm bnep nls_iso8859_1 joydev hid_apple bcm5974 nouveau coretemp kvm_intel b43 kvm uvcvideo videobuf2_core videobuf2_vmalloc videobuf2_memops ghash_clmulni_intel smsc75xx usbnet mii ttm snd_hda_intel(+) snd_hda_codec snd_hwdep ssb i915 snd_pcm mxm_wmi snd_timer apple_gmux applesmc mei lpc_ich microcode hwmon mfd_core input_polldev bcma snd drm_kms_helper snd_page_alloc video apple_bl sdhci_pci sdhci mmc_core CPU 1 Pid: 967, comm: sh Not tainted 3.7.0-rc7-expert+ #8 Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F RIP: 0010:[a01e4006] [a01e4006] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] RSP: 0018:88025198de48 EFLAGS: 00010286 RAX: RBX: 880251960a00 RCX: RDX: RSI: RDI: 880265b41098 RBP: 88025198de68 R08: 0003 R09: 1000 R10: 7fffe481b730 R11: 0246 R12: 880265b41098 R13: R14: 88025198df50 R15: FS: 7f4961480700() GS:88026f24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0170 CR3: 000263cd3000 CR4: 001407e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process sh (pid: 967, threadinfo 88025198c000, task 88025d635820) Stack: 88025d635820 880251960a00 88025198de98 88025198de88 812b8e77 880263ef1740 0004 88025198def8 812b947c 88020a46464f 81107982 Call Trace: [812b8e77] set_audio_state+0x67/0x70 [812b947c] vga_switcheroo_debugfs_write+0xbc/0x380 [81107982] ? __alloc_fd+0x42/0x110 [81107aa9] ? __fd_install+0x29/0x60 [810ed703] vfs_write+0xa3/0x160 [810eda0d] sys_write+0x4d/0xa0 [810283f9] ? do_page_fault+0x9/0x10 [814b7ed6] system_call_fastpath+0x1a/0x1f Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98 00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 fa a4 0d e1 48 8b 98 70 01 00 00 0f b6 83 dd 01 00 00 a8 10 75 34 45 85 ed RIP [a01e4006] azx_vs_set_state+0x26/0x1a0 [snd_hda_intel] RSP 88025198de48 CR2: 0170 --- [2] $ gdb ./sound/pci/hda/snd-hda-intel.ko (gdb) list *(azx_vs_set_state+0x26) 0x3036 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2628). 2623 2624static void azx_vs_set_state(struct pci_dev *pci, 2625 enum vga_switcheroo_state state) 2626{ 2627struct snd_card *card = pci_get_drvdata(pci); 2628struct azx *chip = card-private_data; 2629bool disabled; 2630 2631if (chip-init_failed) 2632return; -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? thanks, Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..b7c482a 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include linux/pm_runtime.h #include linux/clocksource.h #include linux/time.h +#include linux/completion.h #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card-private_data; + wait_for_completion(chip-probe_wait); if (chip-init_failed) return false; if (chip-disabled || !chip-bus) @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip-init_failed = 1; /* to be sure */ + complete(chip-probe_wait); + if (use_vga_switcheroo(chip)) { if (chip-disabled chip-bus) snd_hda_unlock_devices(chip-bus); @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(chip-pcm_list); INIT_LIST_HEAD(chip-list); init_vga_switcheroo(chip); + init_completion(chip-probe_wait); chip-position_fix[0] = chip-position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3462,17 +3469,8 @@ static int __devinit azx_probe(struct pci_dev *pci, } #endif /* CONFIG_SND_HDA_PATCH_LOADER */ - if (probe_now) { - err = azx_probe_continue(chip); - if (err 0) - goto out_free; - } - pci_set_drvdata(pci, card); - if (pci_dev_run_wake(pci)) - pm_runtime_put_noidle(pci-dev); - err = register_vga_switcheroo(chip); if (err 0) { snd_printk(KERN_ERR SFX @@ -3480,11 +3478,22 @@ static int __devinit azx_probe(struct pci_dev *pci, goto out_free; } + if (probe_now) { + err = azx_probe_continue(chip); + if (err 0) + goto out_free; + } + + if (pci_dev_run_wake(pci)) + pm_runtime_put_noidle(pci-dev); + dev++; + complete(chip-probe_wait); return 0; out_free: snd_card_free(card); + pci_set_drvdata(pci, NULL); return err; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... thanks, Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..ce990db 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val) spin_lock_irq(chip-reg_lock); /* add command to corb */ - wp = azx_readb(chip, CORBWP); + wp = azx_readw(chip, CORBWP); + if (wp == 0x) { + /* something wrong, controller likely turned to D3 */ + spin_unlock_irq(chip-reg_lock); + return -1; + } wp++; wp %= ICH6_MAX_CORB_ENTRIES; chip-rirb.cmds[addr]++; chip-corb.buf[wp] = cpu_to_le32(val); - azx_writel(chip, CORBWP, wp); + azx_writew(chip, CORBWP, wp); spin_unlock_irq(chip-reg_lock); @@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip) unsigned int addr; u32 res, res_ex; - wp = azx_readb(chip, RIRBWP); + wp = azx_readw(chip, RIRBWP); + if (wp == 0x) { + /* something wrong, controller likely turned to D3 */ + return; + } + if (wp == chip-rirb.wp) return; chip-rirb.wp = wp; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [8.203827] snd_hda_intel :00:1b.0: enabling device ( - 0002) [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X [ 10.981297] VGA switcheroo: switched nouveau off [ 10.981383] nouveau [ DRM] suspending fbcon... [ 10.981388] nouveau [ DRM] suspending display... [ 10.981687] nouveau [ DRM] unpinning framebuffer(s)... [ 10.981825] nouveau [ DRM] evicting buffers... [ 10.992948] nouveau [ DRM] suspending client object trees... [ 11.310697] hda-intel: azx_get_response timeout, switching to polling mode: last cmd=0x000f [ 12.320236] hda-intel: No response from codec, disabling MSI: last cmd=0x000f [ 13.329721] hda-intel: Codec #0 probe error; disabling it... [ 14.419163] hda_intel: azx_get_response timeout, switching to single_cmd mode: last cmd=0x000f [ 14.419855] hda-intel: no codecs initialized [ 14.459150] snd_hda_intel :01:00.1: Refused to change power state, currently in D3 [ 14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client [ 14.459183] hda-intel: VGA controller for :01:00.1 is disabled [ 14.459184] hda-intel: Delaying initialization Full log at: http://quora.org/2012/hda-switch-spurious2.txt The first response timeout message plausibly could be from the CS4206 (0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio devices after. Also, we see that checking for master abort in one place won't be sufficient. Stay tuned for backtraces tomorrow. Thanks again so far! Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Tue, 4 Dec 2012 00:50:56 +0800, Daniel J Blueman wrote: On 4 December 2012 00:23, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 23:08:28 +0800, Daniel J Blueman wrote: On 3 December 2012 22:40, Takashi Iwai ti...@suse.de wrote: At Mon, 3 Dec 2012 22:25:52 +0800, Daniel J Blueman wrote: On 3 December 2012 19:17, Takashi Iwai ti...@suse.de wrote: At Wed, 28 Nov 2012 09:45:39 +0100, Takashi Iwai wrote: At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Ping. If this really works, I'd like to queue it for 3.8 merge, at least... Ping ack; I was trying to find time to understand another race that occurs with GPU probing after switching, but is separate from the situation before switching, here. In the context of writing the switch, it looks like struct azx isn't allocated by the time azx_vs_set_state accesses it [1,2]; racing with azx_codec_create? It was allocated, but it wasn't assigned properly in pci drvdata. Below is the revised patch. Just moved pci_set_drvdata() before register_vga_switcheroo(). Could you retest with it? Superb; this addresses the oops. OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. ~1 second after the DGPU is put into D3, I still often see hda-intel: spurious response 0x0:0x0, last cmd=0x170500: http://quora.org/2012/hda-switch-spurious.txt Hm, it's not clear who triggers these messages. I'll try to check the code paths. Presumably this implies the read of the ring-buffer pointer returned 0x, so the HDA driver understands the pointer to have wrapped and processes the 191 unwritten entries? Good point. Actually there is one bug that looks obviously wrong (writing 32bit value to CORBWP). Maybe it has been working just because writing CORBRP doesn't influence except for the reset bit. Reading CORBWP as a byte is OK, but this could be better in a word so that we can check 0x as invalid. A test patch is below. Hopefully this improves the situation... I'll check this out tomorrow and also instrument the code to get a backtrace, since there may still be an underlying race with the previous patches: [8.203827] snd_hda_intel :00:1b.0: enabling device ( - 0002) [8.203936] snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X [ 10.981297] VGA switcheroo: switched nouveau off [ 10.981383] nouveau [ DRM] suspending fbcon... [ 10.981388] nouveau [ DRM] suspending display... [ 10.981687] nouveau [ DRM] unpinning framebuffer(s)... [ 10.981825] nouveau [ DRM] evicting buffers... [ 10.992948] nouveau [ DRM] suspending client object trees... [ 11.310697] hda-intel: azx_get_response timeout, switching to polling mode: last cmd=0x000f [ 12.320236] hda-intel: No response from codec, disabling MSI: last cmd=0x000f [ 13.329721] hda-intel: Codec #0 probe error; disabling it... [ 14.419163] hda_intel: azx_get_response timeout, switching to single_cmd mode: last cmd=0x000f [ 14.419855] hda-intel: no codecs initialized [ 14.459150] snd_hda_intel :01:00.1: Refused to change power state, currently in D3 [ 14.459176] hda-intel: :01:00.1: Handle VGA-switcheroo audio client [ 14.459183] hda-intel: VGA controller for :01:00.1 is disabled [ 14.459184] hda-intel: Delaying initialization Full log at: http://quora.org/2012/hda-switch-spurious2.txt The first response timeout message plausibly could be from the CS4206 (0:1b.0) or Nvidia DGPU (1:0.1); either way, there are no audio devices after. That's odd. The Cirrus one (:00:1b.0) must be independent from the vga switcheroo things for Nvidia... The patch below is the revised patch of the first one. Now the vga switcheroo registration is done before the video controller D3 check, so the race
Re: switcheroo registration vs switching race...
At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: > > Hi Seth, Dave, Takashi, > > If I power down the unused discrete GPU before lightdm starts by > fiddling with the sysfs file [1] in the upstart script, I see a race > manifesting as the discrete GPU's HDA controller timing out to > commands [2]. > > Adding some debug, I see that the registered audio devices are put > into D3 before the GPU is, but it turns out that the discrete (and > internal) GPU's HDA controller gets registered a bit later, so the > list is empty. The symptom is since the HDA driver it's talking to > hardware which is now in D3. > > We could add a mutex to nouveau to allow us to wait for the DGPU HDA > controller, but perhaps this should be solved at a higher level in the > vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..17fbd68 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include #include #include +#include #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card->private_data; + wait_for_completion(>probe_wait); if (chip->init_failed) return false; if (chip->disabled || !chip->bus) @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip->init_failed = 1; /* to be sure */ + complete(>probe_wait); + if (use_vga_switcheroo(chip)) { if (chip->disabled && chip->bus) snd_hda_unlock_devices(chip->bus); @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(>pcm_list); INIT_LIST_HEAD(>list); init_vga_switcheroo(chip); + init_completion(>probe_wait); chip->position_fix[0] = chip->position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci, } #endif /* CONFIG_SND_HDA_PATCH_LOADER */ + err = register_vga_switcheroo(chip); + if (err < 0) { + snd_printk(KERN_ERR SFX + "Error registering VGA-switcheroo client\n"); + goto out_free; + } + if (probe_now) { err = azx_probe_continue(chip); if (err < 0) @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci, if (pci_dev_run_wake(pci)) pm_runtime_put_noidle(>dev); - err = register_vga_switcheroo(chip); - if (err < 0) { - snd_printk(KERN_ERR SFX - "Error registering VGA-switcheroo client\n"); - goto out_free; - } - dev++; + complete(>probe_wait); return 0; out_free: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: switcheroo registration vs switching race...
At Wed, 28 Nov 2012 11:45:07 +0800, Daniel J Blueman wrote: Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Maybe it's a side effect for the recent effort to fix another race in the probe. A part of them problem is that the registration is done at the very last of probing. Instead of delaying the registration, how about the patch below? Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..17fbd68 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include linux/pm_runtime.h #include linux/clocksource.h #include linux/time.h +#include linux/completion.h #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2790,6 +2792,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card-private_data; + wait_for_completion(chip-probe_wait); if (chip-init_failed) return false; if (chip-disabled || !chip-bus) @@ -2851,6 +2854,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip-init_failed = 1; /* to be sure */ + complete(chip-probe_wait); + if (use_vga_switcheroo(chip)) { if (chip-disabled chip-bus) snd_hda_unlock_devices(chip-bus); @@ -3156,6 +3162,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(chip-pcm_list); INIT_LIST_HEAD(chip-list); init_vga_switcheroo(chip); + init_completion(chip-probe_wait); chip-position_fix[0] = chip-position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3462,6 +3469,13 @@ static int __devinit azx_probe(struct pci_dev *pci, } #endif /* CONFIG_SND_HDA_PATCH_LOADER */ + err = register_vga_switcheroo(chip); + if (err 0) { + snd_printk(KERN_ERR SFX + Error registering VGA-switcheroo client\n); + goto out_free; + } + if (probe_now) { err = azx_probe_continue(chip); if (err 0) @@ -3473,14 +3487,8 @@ static int __devinit azx_probe(struct pci_dev *pci, if (pci_dev_run_wake(pci)) pm_runtime_put_noidle(pci-dev); - err = register_vga_switcheroo(chip); - if (err 0) { - snd_printk(KERN_ERR SFX - Error registering VGA-switcheroo client\n); - goto out_free; - } - dev++; + complete(chip-probe_wait); return 0; out_free: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
switcheroo registration vs switching race...
Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Thanks, Daniel --- [1] echo OFF >/sys/kernel/debug/vgaswitcheroo/switch --- [2] snd_hda_intel :00:1b.0: enabling device ( -> 0002) snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X input: HDA Intel PCH Mic as /devices/pci:00/:00:1b.0/sound/card0/input10 input: HDA Intel PCH Headphone as /devices/pci:00/:00:1b.0/sound/card0/input11 nouveau [ VBIOS][:01:00.0] ... appears to be valid nouveau [ VBIOS][:01:00.0] using image from PRAMIN nouveau [ VBIOS][:01:00.0] BIT signature found nouveau [ VBIOS][:01:00.0] version 80.07.26.04 nouveau [ MXM][:01:00.0] no VBIOS data, nothing to do nouveau [ PFB][:01:00.0] RAM type: GDDR5 nouveau [ PFB][:01:00.0] RAM size: 1024 MiB nouveau W[ PGRAPH][:01:00.0] disabled, PGRAPH=1 to enable vga_switcheroo: enabled [TTM] Zone kernel: Available graphics memory: 4076308 kiB [TTM] Zone dma32: Available graphics memory: 2097152 kiB [TTM] Initializing pool allocator [TTM] Initializing DMA pool allocator nouveau [ DRM] VRAM: 1024 MiB nouveau [ DRM] GART: 512 MiB nouveau [ DRM] BIT BIOS found nouveau [ DRM] Bios version 80.07.26.04 nouveau [ DRM] TMDS table version 2.0 nouveau [ DRM] DCB version 4.0 nouveau [ DRM] DCB outp 00: 048101b6 0f230010 nouveau [ DRM] DCB outp 01: 018212d6 0f220020 nouveau [ DRM] DCB outp 02: 01021212 00020020 nouveau [ DRM] DCB outp 03: 088324c6 0f220010 nouveau [ DRM] DCB outp 04: 08032402 00020010 nouveau [ DRM] DCB outp 05: 02843862 00020010 nouveau [ DRM] DCB conn 00: 00020047 nouveau [ DRM] DCB conn 01: 02208146 nouveau [ DRM] DCB conn 02: 01104246 nouveau [ DRM] DCB conn 03: 00410361 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [drm] No driver support for vblank timestamp query. nouveau W[ DRM] voltage table 0x50 unknown nouveau [ DRM] 4 available performance level(s) nouveau [ DRM] 1: core 209MHz shader 419MHz memory 405MHz voltage 520mV nouveau [ DRM] 2: core 390MHz shader 780MHz memory 1080MHz voltage 610mV nouveau [ DRM] 3: core 1000MHz shader 2000MHz memory 1080MHz voltage 630mV nouveau [ DRM] 4: core 1254MHz shader 2508MHz memory 1080MHz voltage 630mV nouveau [ DRM] c: nouveau E[ DRM] failed to create kernel channel, -22 No connectors reported connected with modes [drm] Cannot find any crtc or sizes - going 1024x768 nouveau [ DRM] allocated 1024x768 fb: 0x6, bo 880264974400 fb1: nouveaufb frame buffer device [drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 1 snd_hda_intel :01:00.1: enabling device ( -> 0002) hda-intel: :01:00.1: Handle VGA-switcheroo audio client snd_hda_intel :01:00.1: irq 52 for MSI/MSI-X VGA switcheroo: switched nouveau off nouveau [ DRM] suspending fbcon... nouveau [ DRM] suspending display... nouveau [ DRM] unpinning framebuffer(s)... nouveau [ DRM] evicting buffers... nouveau [ DRM] suspending client object trees... input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input12 input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input13 input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input14 nouveau E[ I2C][:01:00.0] AUXCH(3): begin idle timeout 0x nouveau E[ I2C][:01:00.0] AUXCH(2): begin idle timeout 0x hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503
switcheroo registration vs switching race...
Hi Seth, Dave, Takashi, If I power down the unused discrete GPU before lightdm starts by fiddling with the sysfs file [1] in the upstart script, I see a race manifesting as the discrete GPU's HDA controller timing out to commands [2]. Adding some debug, I see that the registered audio devices are put into D3 before the GPU is, but it turns out that the discrete (and internal) GPU's HDA controller gets registered a bit later, so the list is empty. The symptom is since the HDA driver it's talking to hardware which is now in D3. We could add a mutex to nouveau to allow us to wait for the DGPU HDA controller, but perhaps this should be solved at a higher level in the vgaswitcheroo code; what do you think? Thanks, Daniel --- [1] echo OFF /sys/kernel/debug/vgaswitcheroo/switch --- [2] snd_hda_intel :00:1b.0: enabling device ( - 0002) snd_hda_intel :00:1b.0: irq 51 for MSI/MSI-X input: HDA Intel PCH Mic as /devices/pci:00/:00:1b.0/sound/card0/input10 input: HDA Intel PCH Headphone as /devices/pci:00/:00:1b.0/sound/card0/input11 nouveau [ VBIOS][:01:00.0] ... appears to be valid nouveau [ VBIOS][:01:00.0] using image from PRAMIN nouveau [ VBIOS][:01:00.0] BIT signature found nouveau [ VBIOS][:01:00.0] version 80.07.26.04 nouveau [ MXM][:01:00.0] no VBIOS data, nothing to do nouveau [ PFB][:01:00.0] RAM type: GDDR5 nouveau [ PFB][:01:00.0] RAM size: 1024 MiB nouveau W[ PGRAPH][:01:00.0] disabled, PGRAPH=1 to enable vga_switcheroo: enabled [TTM] Zone kernel: Available graphics memory: 4076308 kiB [TTM] Zone dma32: Available graphics memory: 2097152 kiB [TTM] Initializing pool allocator [TTM] Initializing DMA pool allocator nouveau [ DRM] VRAM: 1024 MiB nouveau [ DRM] GART: 512 MiB nouveau [ DRM] BIT BIOS found nouveau [ DRM] Bios version 80.07.26.04 nouveau [ DRM] TMDS table version 2.0 nouveau [ DRM] DCB version 4.0 nouveau [ DRM] DCB outp 00: 048101b6 0f230010 nouveau [ DRM] DCB outp 01: 018212d6 0f220020 nouveau [ DRM] DCB outp 02: 01021212 00020020 nouveau [ DRM] DCB outp 03: 088324c6 0f220010 nouveau [ DRM] DCB outp 04: 08032402 00020010 nouveau [ DRM] DCB outp 05: 02843862 00020010 nouveau [ DRM] DCB conn 00: 00020047 nouveau [ DRM] DCB conn 01: 02208146 nouveau [ DRM] DCB conn 02: 01104246 nouveau [ DRM] DCB conn 03: 00410361 [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [drm] No driver support for vblank timestamp query. nouveau W[ DRM] voltage table 0x50 unknown nouveau [ DRM] 4 available performance level(s) nouveau [ DRM] 1: core 209MHz shader 419MHz memory 405MHz voltage 520mV nouveau [ DRM] 2: core 390MHz shader 780MHz memory 1080MHz voltage 610mV nouveau [ DRM] 3: core 1000MHz shader 2000MHz memory 1080MHz voltage 630mV nouveau [ DRM] 4: core 1254MHz shader 2508MHz memory 1080MHz voltage 630mV nouveau [ DRM] c: nouveau E[ DRM] failed to create kernel channel, -22 No connectors reported connected with modes [drm] Cannot find any crtc or sizes - going 1024x768 nouveau [ DRM] allocated 1024x768 fb: 0x6, bo 880264974400 fb1: nouveaufb frame buffer device [drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 1 snd_hda_intel :01:00.1: enabling device ( - 0002) hda-intel: :01:00.1: Handle VGA-switcheroo audio client snd_hda_intel :01:00.1: irq 52 for MSI/MSI-X VGA switcheroo: switched nouveau off nouveau [ DRM] suspending fbcon... nouveau [ DRM] suspending display... nouveau [ DRM] unpinning framebuffer(s)... nouveau [ DRM] evicting buffers... nouveau [ DRM] suspending client object trees... input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input12 input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input13 input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci:00/:00:01.0/:01:00.1/sound/card1/input14 nouveau E[ I2C][:01:00.0] AUXCH(3): begin idle timeout 0x nouveau E[ I2C][:01:00.0] AUXCH(2): begin idle timeout 0x hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: spurious response 0x0:0x0, last cmd=0x170503 hda-intel: