Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-11 Thread Alexandre Courbot
Sending the revert patch to Dave after receiving his green light for
this, and will investigate the issue on my side. I should be able to find a
gk107 somewhere...

On Wed, Aug 12, 2015 at 12:35 PM, Alexandre Courbot gnu...@gmail.com wrote:
 Mmm in that case it is probably best to revert that commit for the
 time being. It was targeting GM20B (and maybe other Maxwells too) so
 reverting it should not hurt anyone at the moment. I think Ben is on
 holidays for now, is there anyone else who can send a pull request to
 Dave Airlie for this? We don't want 4.2 to ship with a crash every
 other reboot...

 On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers ebigge...@gmail.com wrote:
 Hi,

 I think I've done about 10 reboots with the commit reverted and I never
 experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
 other reboot.

 Probably relevant: the computer on which the crash occurs has two GPUs (one
 Intel and one Nvidia).  The Intel one is actually being used, whereas I
 presume the Nvidia one is being automatically disabled shortly after boot,
 perhaps when the crash occurs...

 Eric

 On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot gnu...@gmail.com
 wrote:

 Indeed, and I am actually surprised to see one here. I will
 double-check that patch.

 Eric, would you be able to give an estimate of the repro rate for this
 issue? More testing with and without the patch would be welcome, it'd
 be good to know whether it is actually the culprit or not.

 On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
  Alexandre, could you take a look? 0xbad* generally comes from bad mmio
  reads.
 
  On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:
 
  Hi,
 
  I am testing Linux v4.2-rc5 and I am sporadically getting crashes
  shortly
  after
  startup in gk104_fifo_intr_runlist().  What I've found is that the
  'mask'
  value
  read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
  'engn'
  variable to be assigned the value 9, which is invalid; then wake_up()
  is
  called
  on an uninitialized waitqueue which causes the crash.
 
  Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
  when
  deactivating them) seemed to make the problem go away, although I
  can't
  be 100%
  sure because the problem is sporadic.
 
  Attached an example of the kernel log up to the crash.
 
  Eric
 
  ___
  Nouveau mailing list
  Nouveau@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/nouveau
 
 


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-11 Thread Eric Biggers
Hi,

I think I've done about 10 reboots with the commit reverted and I never
experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
other reboot.

Probably relevant: the computer on which the crash occurs has two GPUs (one
Intel and one Nvidia).  The Intel one is actually being used, whereas I
presume the Nvidia one is being automatically disabled shortly after boot,
perhaps when the crash occurs...

Eric

On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot gnu...@gmail.com
wrote:

 Indeed, and I am actually surprised to see one here. I will
 double-check that patch.

 Eric, would you be able to give an estimate of the repro rate for this
 issue? More testing with and without the patch would be welcome, it'd
 be good to know whether it is actually the culprit or not.

 On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
  Alexandre, could you take a look? 0xbad* generally comes from bad mmio
  reads.
 
  On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:
 
  Hi,
 
  I am testing Linux v4.2-rc5 and I am sporadically getting crashes
 shortly
  after
  startup in gk104_fifo_intr_runlist().  What I've found is that the
 'mask'
  value
  read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
  'engn'
  variable to be assigned the value 9, which is invalid; then wake_up() is
  called
  on an uninitialized waitqueue which causes the crash.
 
  Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
  when
  deactivating them) seemed to make the problem go away, although I can't
  be 100%
  sure because the problem is sporadic.
 
  Attached an example of the kernel log up to the crash.
 
  Eric
 
  ___
  Nouveau mailing list
  Nouveau@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/nouveau
 
 

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-11 Thread Ilia Mirkin
I'm guessing that optimus is the operative difference, not the
specific chip. Basically something that can be put to sleep via
ACPI...

On Tue, Aug 11, 2015 at 11:53 PM, Alexandre Courbot gnu...@gmail.com wrote:
 Sending the revert patch to Dave after receiving his green light for
 this, and will investigate the issue on my side. I should be able to find a
 gk107 somewhere...

 On Wed, Aug 12, 2015 at 12:35 PM, Alexandre Courbot gnu...@gmail.com wrote:
 Mmm in that case it is probably best to revert that commit for the
 time being. It was targeting GM20B (and maybe other Maxwells too) so
 reverting it should not hurt anyone at the moment. I think Ben is on
 holidays for now, is there anyone else who can send a pull request to
 Dave Airlie for this? We don't want 4.2 to ship with a crash every
 other reboot...

 On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers ebigge...@gmail.com wrote:
 Hi,

 I think I've done about 10 reboots with the commit reverted and I never
 experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
 other reboot.

 Probably relevant: the computer on which the crash occurs has two GPUs (one
 Intel and one Nvidia).  The Intel one is actually being used, whereas I
 presume the Nvidia one is being automatically disabled shortly after boot,
 perhaps when the crash occurs...

 Eric

 On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot gnu...@gmail.com
 wrote:

 Indeed, and I am actually surprised to see one here. I will
 double-check that patch.

 Eric, would you be able to give an estimate of the repro rate for this
 issue? More testing with and without the patch would be welcome, it'd
 be good to know whether it is actually the culprit or not.

 On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
  Alexandre, could you take a look? 0xbad* generally comes from bad mmio
  reads.
 
  On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:
 
  Hi,
 
  I am testing Linux v4.2-rc5 and I am sporadically getting crashes
  shortly
  after
  startup in gk104_fifo_intr_runlist().  What I've found is that the
  'mask'
  value
  read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
  'engn'
  variable to be assigned the value 9, which is invalid; then wake_up()
  is
  called
  on an uninitialized waitqueue which causes the crash.
 
  Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
  when
  deactivating them) seemed to make the problem go away, although I
  can't
  be 100%
  sure because the problem is sporadic.
 
  Attached an example of the kernel log up to the crash.
 
  Eric
 
  ___
  Nouveau mailing list
  Nouveau@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/nouveau
 
 


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-11 Thread Alexandre Courbot
Right, that 0xbad0da00 is indicative of something being offline that
should not be at that time. I have sent the revert patch. Thanks Eric
for reporting this!

On Wed, Aug 12, 2015 at 1:00 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 I'm guessing that optimus is the operative difference, not the
 specific chip. Basically something that can be put to sleep via
 ACPI...

 On Tue, Aug 11, 2015 at 11:53 PM, Alexandre Courbot gnu...@gmail.com wrote:
 Sending the revert patch to Dave after receiving his green light for
 this, and will investigate the issue on my side. I should be able to find a
 gk107 somewhere...

 On Wed, Aug 12, 2015 at 12:35 PM, Alexandre Courbot gnu...@gmail.com wrote:
 Mmm in that case it is probably best to revert that commit for the
 time being. It was targeting GM20B (and maybe other Maxwells too) so
 reverting it should not hurt anyone at the moment. I think Ben is on
 holidays for now, is there anyone else who can send a pull request to
 Dave Airlie for this? We don't want 4.2 to ship with a crash every
 other reboot...

 On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers ebigge...@gmail.com wrote:
 Hi,

 I think I've done about 10 reboots with the commit reverted and I never
 experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
 other reboot.

 Probably relevant: the computer on which the crash occurs has two GPUs (one
 Intel and one Nvidia).  The Intel one is actually being used, whereas I
 presume the Nvidia one is being automatically disabled shortly after boot,
 perhaps when the crash occurs...

 Eric

 On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot gnu...@gmail.com
 wrote:

 Indeed, and I am actually surprised to see one here. I will
 double-check that patch.

 Eric, would you be able to give an estimate of the repro rate for this
 issue? More testing with and without the patch would be welcome, it'd
 be good to know whether it is actually the culprit or not.

 On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
  Alexandre, could you take a look? 0xbad* generally comes from bad mmio
  reads.
 
  On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:
 
  Hi,
 
  I am testing Linux v4.2-rc5 and I am sporadically getting crashes
  shortly
  after
  startup in gk104_fifo_intr_runlist().  What I've found is that the
  'mask'
  value
  read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
  'engn'
  variable to be assigned the value 9, which is invalid; then wake_up()
  is
  called
  on an uninitialized waitqueue which causes the crash.
 
  Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
  when
  deactivating them) seemed to make the problem go away, although I
  can't
  be 100%
  sure because the problem is sporadic.
 
  Attached an example of the kernel log up to the crash.
 
  Eric
 
  ___
  Nouveau mailing list
  Nouveau@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/nouveau
 
 


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-11 Thread Alexandre Courbot
Mmm in that case it is probably best to revert that commit for the
time being. It was targeting GM20B (and maybe other Maxwells too) so
reverting it should not hurt anyone at the moment. I think Ben is on
holidays for now, is there anyone else who can send a pull request to
Dave Airlie for this? We don't want 4.2 to ship with a crash every
other reboot...

On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers ebigge...@gmail.com wrote:
 Hi,

 I think I've done about 10 reboots with the commit reverted and I never
 experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
 other reboot.

 Probably relevant: the computer on which the crash occurs has two GPUs (one
 Intel and one Nvidia).  The Intel one is actually being used, whereas I
 presume the Nvidia one is being automatically disabled shortly after boot,
 perhaps when the crash occurs...

 Eric

 On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot gnu...@gmail.com
 wrote:

 Indeed, and I am actually surprised to see one here. I will
 double-check that patch.

 Eric, would you be able to give an estimate of the repro rate for this
 issue? More testing with and without the patch would be welcome, it'd
 be good to know whether it is actually the culprit or not.

 On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
  Alexandre, could you take a look? 0xbad* generally comes from bad mmio
  reads.
 
  On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:
 
  Hi,
 
  I am testing Linux v4.2-rc5 and I am sporadically getting crashes
  shortly
  after
  startup in gk104_fifo_intr_runlist().  What I've found is that the
  'mask'
  value
  read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
  'engn'
  variable to be assigned the value 9, which is invalid; then wake_up()
  is
  called
  on an uninitialized waitqueue which causes the crash.
 
  Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
  when
  deactivating them) seemed to make the problem go away, although I
  can't
  be 100%
  sure because the problem is sporadic.
 
  Attached an example of the kernel log up to the crash.
 
  Eric
 
  ___
  Nouveau mailing list
  Nouveau@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/nouveau
 
 


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-10 Thread Alexandre Courbot
Indeed, and I am actually surprised to see one here. I will
double-check that patch.

Eric, would you be able to give an estimate of the repro rate for this
issue? More testing with and without the patch would be welcome, it'd
be good to know whether it is actually the culprit or not.

On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Alexandre, could you take a look? 0xbad* generally comes from bad mmio
 reads.

 On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:

 Hi,

 I am testing Linux v4.2-rc5 and I am sporadically getting crashes shortly
 after
 startup in gk104_fifo_intr_runlist().  What I've found is that the 'mask'
 value
 read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
 'engn'
 variable to be assigned the value 9, which is invalid; then wake_up() is
 called
 on an uninitialized waitqueue which causes the crash.

 Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
 when
 deactivating them) seemed to make the problem go away, although I can't
 be 100%
 sure because the problem is sporadic.

 Attached an example of the kernel log up to the crash.

 Eric

 ___
 Nouveau mailing list
 Nouveau@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-09 Thread Eric Biggers
Hi,

I am testing Linux v4.2-rc5 and I am sporadically getting crashes shortly after
startup in gk104_fifo_intr_runlist().  What I've found is that the 'mask' value
read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the 'engn'
variable to be assigned the value 9, which is invalid; then wake_up() is called
on an uninitialized waitqueue which causes the crash.

Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels when
deactivating them) seemed to make the problem go away, although I can't be 100%
sure because the problem is sporadic.

Attached an example of the kernel log up to the crash.

Eric


dmesg.gz
Description: application/gzip
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

2015-08-09 Thread Ilia Mirkin
Alexandre, could you take a look? 0xbad* generally comes from bad mmio
reads.
On Aug 9, 2015 1:08 PM, Eric Biggers ebigge...@gmail.com wrote:

 Hi,

 I am testing Linux v4.2-rc5 and I am sporadically getting crashes shortly
 after
 startup in gk104_fifo_intr_runlist().  What I've found is that the 'mask'
 value
 read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the 'engn'
 variable to be assigned the value 9, which is invalid; then wake_up() is
 called
 on an uninitialized waitqueue which causes the crash.

 Reverting commit 1addc12648521d (drm/nouveau/fifo/gk104: kick channels
 when
 deactivating them) seemed to make the problem go away, although I can't
 be 100%
 sure because the problem is sporadic.

 Attached an example of the kernel log up to the crash.

 Eric

 ___
 Nouveau mailing list
 Nouveau@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau