[pci] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_crtc.c:94 drm_warn_on_modeset_not_all_locked()
On Sun, Mar 23, 2014 at 8:53 AM, Fengguang Wu wrote: > Hi Bjorn, > > On Fri, Mar 21, 2014 at 12:42:33PM -0600, Bjorn Helgaas wrote: >> On Thu, Mar 20, 2014 at 8:09 PM, Fengguang Wu >> wrote: >> > // CC Stephane for RAPL related bug >> > >> > Bjorn, sorry this bug report is mis-titled. The only new bug that show >> > up in aa11fc58dc is on rapl_pmu_init. And it shows up only 1 time, so >> > it's hard to reproduce and the bisect is likely not accurate. I'll >> > retry the bisect with more repeat count. Sorry for the disturbing! >> >> This testing is potentially very useful, but only if we don't have >> many false positives. I spent a lot of time trying to figure this >> out, and it turned out not to be a problem at all. > > I'm sorry for the false report! I'll be careful and improve the > process. Currently there are many false positives in our internal > boot error bisects. And we rely on human reviews to select good > bisects out of the noises. In this case both the script and me made > mistakes, which lead to the wrong report. > >> As a procedural question, can you help me figure out how to handle a >> report like this? What I *hoped* for would be: >> >> - the config you used > > Yes. > >> - the dmesg log from the newest good commit > > I'll attach it if the first bad commit's parent commit(s) has some > noise errors. In this case it may help decide whether the bisect is > wrong: in some cases one bug will hide another one; or the bug message > may change from one to the other. > >> - the dmesg log from the oldest bad commit (the one you bisected to) > > OK, I've fixed the script to attach it (rather than attaching the > branch HEAD's dmesg). > >> - maybe a hint about how I can reproduce the problem, e.g., the qemu >> config I need > > OK, fixed the reporting script to include the QEMU commands for > reproducing the problem. > >> You did supply the config, which is good. But you only supplied one >> dmesg log, and it doesn't seem to be from the oldest bad commit. In >> fact, it seems to be from some commit that isn't actually in either >> Linus' tree or in linux-next. So I don't know what the connection is >> with the bad commit. > > Sorry the dmesg file is from the internal merge-and-testing branch's > HEAD -- where the bisect starts. I'll attach the first bad commit's > dmesg instead. > >> What should I do to try to debug a report like this? Where should I start? > > Thank you very much for the suggestions! Excellent, thanks! I think these will make it much easier to figure out where to start. Bjorn
[pci] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_crtc.c:94 drm_warn_on_modeset_not_all_locked()
Hi Bjorn, On Fri, Mar 21, 2014 at 12:42:33PM -0600, Bjorn Helgaas wrote: > On Thu, Mar 20, 2014 at 8:09 PM, Fengguang Wu > wrote: > > // CC Stephane for RAPL related bug > > > > Bjorn, sorry this bug report is mis-titled. The only new bug that show > > up in aa11fc58dc is on rapl_pmu_init. And it shows up only 1 time, so > > it's hard to reproduce and the bisect is likely not accurate. I'll > > retry the bisect with more repeat count. Sorry for the disturbing! > > This testing is potentially very useful, but only if we don't have > many false positives. I spent a lot of time trying to figure this > out, and it turned out not to be a problem at all. I'm sorry for the false report! I'll be careful and improve the process. Currently there are many false positives in our internal boot error bisects. And we rely on human reviews to select good bisects out of the noises. In this case both the script and me made mistakes, which lead to the wrong report. > As a procedural question, can you help me figure out how to handle a > report like this? What I *hoped* for would be: > > - the config you used Yes. > - the dmesg log from the newest good commit I'll attach it if the first bad commit's parent commit(s) has some noise errors. In this case it may help decide whether the bisect is wrong: in some cases one bug will hide another one; or the bug message may change from one to the other. > - the dmesg log from the oldest bad commit (the one you bisected to) OK, I've fixed the script to attach it (rather than attaching the branch HEAD's dmesg). > - maybe a hint about how I can reproduce the problem, e.g., the qemu > config I need OK, fixed the reporting script to include the QEMU commands for reproducing the problem. > You did supply the config, which is good. But you only supplied one > dmesg log, and it doesn't seem to be from the oldest bad commit. In > fact, it seems to be from some commit that isn't actually in either > Linus' tree or in linux-next. So I don't know what the connection is > with the bad commit. Sorry the dmesg file is from the internal merge-and-testing branch's HEAD -- where the bisect starts. I'll attach the first bad commit's dmesg instead. > What should I do to try to debug a report like this? Where should I start? Thank you very much for the suggestions! Regards, Fengguang > Bjorn > > > [2.812392] Unpacking initramfs... > > [2.812392] Unpacking initramfs... > > [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) > > [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) > > [4.952113] BUG: unable to handle kernel > > [4.952113] BUG: unable to handle kernel NULL pointer dereferenceNULL > > pointer dereference at 003c > > at 003c > > [4.952871] IP: > > [4.952871] IP: [<81c439fb>] rapl_pmu_init+0xed/0x165 > > [<81c439fb>] rapl_pmu_init+0xed/0x165 > > [4.954190] *pde = > > [4.954190] *pde = > > > > [4.954619] Oops: [#1] > > [4.954619] Oops: [#1] > > > > [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted > > 3.14.0-rc1-00023-gaa11fc5 #1 > > [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted > > 3.14.0-rc1-00023-gaa11fc5 #1 > > [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 > > [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 > > [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 > > [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 > > [4.957831] EIP is at rapl_pmu_init+0xed/0x165 > > [4.957831] EIP is at rapl_pmu_init+0xed/0x165 > > > > Full dmesg attached. > > > > Thanks, > > Fengguang > > > > On Thu, Mar 20, 2014 at 04:50:08PM -0600, Bjorn Helgaas wrote: > >> On Thu, Mar 20, 2014 at 6:41 AM, Fengguang Wu > >> wrote: > >> > Greetings, > >> > > >> > I got the below dmesg and the first bad commit is > >> > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git > >> > pci/resource > >> > > >> > commit aa11fc58dc71c27701b1f9a529a36a38d4337722 > >> > Author: Bjorn Helgaas > >> > AuthorDate: Fri Mar 7 13:39:01 2014 -0700 > >> > Commit: Bjorn Helgaas > >> > CommitDate: Wed Mar 19 15:00:16 2014 -0600 > >> > > >> > PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() > >> > > >> > When allocating space from a bus resource, i.e., from apertures > >> > leading to > >> > this bus, make sure the entire resource type matches. The previous > >> > code > >> > assumed the IORESOURCE_TYPE_BITS field was a bitmask with only a > >> > single bit > >> > set, but this is not true. IORESOURCE_TYPE_BITS is really an > >> > enumeration, > >> > and we have to check all the bits. > >> > > >> > See 72dcb1197228 ("resources: Add register address resource type"). > >> > > >> > No functional change. If we used
[pci] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_crtc.c:94 drm_warn_on_modeset_not_all_locked()
On Thu, Mar 20, 2014 at 8:09 PM, Fengguang Wu wrote: > // CC Stephane for RAPL related bug > > Bjorn, sorry this bug report is mis-titled. The only new bug that show > up in aa11fc58dc is on rapl_pmu_init. And it shows up only 1 time, so > it's hard to reproduce and the bisect is likely not accurate. I'll > retry the bisect with more repeat count. Sorry for the disturbing! This testing is potentially very useful, but only if we don't have many false positives. I spent a lot of time trying to figure this out, and it turned out not to be a problem at all. As a procedural question, can you help me figure out how to handle a report like this? What I *hoped* for would be: - the config you used - the dmesg log from the newest good commit - the dmesg log from the oldest bad commit (the one you bisected to) - maybe a hint about how I can reproduce the problem, e.g., the qemu config I need You did supply the config, which is good. But you only supplied one dmesg log, and it doesn't seem to be from the oldest bad commit. In fact, it seems to be from some commit that isn't actually in either Linus' tree or in linux-next. So I don't know what the connection is with the bad commit. What should I do to try to debug a report like this? Where should I start? Bjorn > [2.812392] Unpacking initramfs... > [2.812392] Unpacking initramfs... > [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) > [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) > [4.952113] BUG: unable to handle kernel > [4.952113] BUG: unable to handle kernel NULL pointer dereferenceNULL > pointer dereference at 003c > at 003c > [4.952871] IP: > [4.952871] IP: [<81c439fb>] rapl_pmu_init+0xed/0x165 > [<81c439fb>] rapl_pmu_init+0xed/0x165 > [4.954190] *pde = > [4.954190] *pde = > > [4.954619] Oops: [#1] > [4.954619] Oops: [#1] > > [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted > 3.14.0-rc1-00023-gaa11fc5 #1 > [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted > 3.14.0-rc1-00023-gaa11fc5 #1 > [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 > [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 > [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 > [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 > [4.957831] EIP is at rapl_pmu_init+0xed/0x165 > [4.957831] EIP is at rapl_pmu_init+0xed/0x165 > > Full dmesg attached. > > Thanks, > Fengguang > > On Thu, Mar 20, 2014 at 04:50:08PM -0600, Bjorn Helgaas wrote: >> On Thu, Mar 20, 2014 at 6:41 AM, Fengguang Wu >> wrote: >> > Greetings, >> > >> > I got the below dmesg and the first bad commit is >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/resource >> > >> > commit aa11fc58dc71c27701b1f9a529a36a38d4337722 >> > Author: Bjorn Helgaas >> > AuthorDate: Fri Mar 7 13:39:01 2014 -0700 >> > Commit: Bjorn Helgaas >> > CommitDate: Wed Mar 19 15:00:16 2014 -0600 >> > >> > PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() >> > >> > When allocating space from a bus resource, i.e., from apertures >> > leading to >> > this bus, make sure the entire resource type matches. The previous >> > code >> > assumed the IORESOURCE_TYPE_BITS field was a bitmask with only a >> > single bit >> > set, but this is not true. IORESOURCE_TYPE_BITS is really an >> > enumeration, >> > and we have to check all the bits. >> > >> > See 72dcb1197228 ("resources: Add register address resource type"). >> > >> > No functional change. If we used this path for allocating IRQs, DMA >> > channels, or bus numbers, this would fix a bug because those types are >> > indistinguishable when masked by IORESOURCE_IO | IORESOURCE_MEM. But >> > we >> > don't, so this shouldn't make any difference. >> > >> > Signed-off-by: Bjorn Helgaas >> >> Thanks (I think). I'm afraid I'm going to need some more help to >> debug this. I built aa11fc58dc with the config you supplied and >> booted it on qemu with no real issues (it didn't boot all the way >> because the config doesn't include a driver for my root disk, but >> that's to be expected). >> >> The dmesg you supplied is for some other commit 2d18516 that I don't >> have, so I'm confused about why it's not from aa11fc58dc. >> >> I did reproduce what appears to be basically the same problem with >> a654dc797f3e, which is the 20140320 linux-next tree. I backed up to >> 93ecdc077282, which is where pci/next was merged (this includes >> aa11fc58dc), but I could not reproduce the problem there. >> >> So bottom line, I'm confused because your bisection doesn't match what >> I'm seeing, and I don't want to spend more time flailing around. >> >> Bjorn >> >> >> >
[pci] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_crtc.c:94 drm_warn_on_modeset_not_all_locked()
// CC Stephane for RAPL related bug Bjorn, sorry this bug report is mis-titled. The only new bug that show up in aa11fc58dc is on rapl_pmu_init. And it shows up only 1 time, so it's hard to reproduce and the bisect is likely not accurate. I'll retry the bisect with more repeat count. Sorry for the disturbing! [2.812392] Unpacking initramfs... [2.812392] Unpacking initramfs... [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) [4.937582] Freeing initrd memory: 3276K (93cbd000 - 93ff) [4.952113] BUG: unable to handle kernel [4.952113] BUG: unable to handle kernel NULL pointer dereferenceNULL pointer dereference at 003c at 003c [4.952871] IP: [4.952871] IP: [<81c439fb>] rapl_pmu_init+0xed/0x165 [<81c439fb>] rapl_pmu_init+0xed/0x165 [4.954190] *pde = [4.954190] *pde = [4.954619] Oops: [#1] [4.954619] Oops: [#1] [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.0-rc1-00023-gaa11fc5 #1 [4.955440] CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.0-rc1-00023-gaa11fc5 #1 [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [4.956050] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 [4.956672] task: 80030c20 ti: 80032000 task.ti: 80032000 [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 [4.957295] EIP: 0060:[<81c439fb>] EFLAGS: 0246 CPU: 0 [4.957831] EIP is at rapl_pmu_init+0xed/0x165 [4.957831] EIP is at rapl_pmu_init+0xed/0x165 Full dmesg attached. Thanks, Fengguang On Thu, Mar 20, 2014 at 04:50:08PM -0600, Bjorn Helgaas wrote: > On Thu, Mar 20, 2014 at 6:41 AM, Fengguang Wu > wrote: > > Greetings, > > > > I got the below dmesg and the first bad commit is > > > > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/resource > > > > commit aa11fc58dc71c27701b1f9a529a36a38d4337722 > > Author: Bjorn Helgaas > > AuthorDate: Fri Mar 7 13:39:01 2014 -0700 > > Commit: Bjorn Helgaas > > CommitDate: Wed Mar 19 15:00:16 2014 -0600 > > > > PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() > > > > When allocating space from a bus resource, i.e., from apertures leading > > to > > this bus, make sure the entire resource type matches. The previous code > > assumed the IORESOURCE_TYPE_BITS field was a bitmask with only a single > > bit > > set, but this is not true. IORESOURCE_TYPE_BITS is really an > > enumeration, > > and we have to check all the bits. > > > > See 72dcb1197228 ("resources: Add register address resource type"). > > > > No functional change. If we used this path for allocating IRQs, DMA > > channels, or bus numbers, this would fix a bug because those types are > > indistinguishable when masked by IORESOURCE_IO | IORESOURCE_MEM. But we > > don't, so this shouldn't make any difference. > > > > Signed-off-by: Bjorn Helgaas > > Thanks (I think). I'm afraid I'm going to need some more help to > debug this. I built aa11fc58dc with the config you supplied and > booted it on qemu with no real issues (it didn't boot all the way > because the config doesn't include a driver for my root disk, but > that's to be expected). > > The dmesg you supplied is for some other commit 2d18516 that I don't > have, so I'm confused about why it's not from aa11fc58dc. > > I did reproduce what appears to be basically the same problem with > a654dc797f3e, which is the 20140320 linux-next tree. I backed up to > 93ecdc077282, which is where pci/next was merged (this includes > aa11fc58dc), but I could not reproduce the problem there. > > So bottom line, I'm confused because your bisection doesn't match what > I'm seeing, and I don't want to spend more time flailing around. > > Bjorn > > > > ++++ > > | > > | aa11fc58dc | 2d18516523 | > > ++++ > > | boot_successes > > | 19 | 0 | > > | boot_failures > > | 1 | 19 | > > | BUG:unable_to_handle_kernel_NULL_pointer_dereference > > | 1 | 1 | > > | Oops > > | 1 | 1 | > > | EIP_is_at_rapl_pmu_init > > | 1 | 1 | > > |
[pci] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_crtc.c:94 drm_warn_on_modeset_not_all_locked()
On Thu, Mar 20, 2014 at 6:41 AM, Fengguang Wu wrote: > Greetings, > > I got the below dmesg and the first bad commit is > > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/resource > > commit aa11fc58dc71c27701b1f9a529a36a38d4337722 > Author: Bjorn Helgaas > AuthorDate: Fri Mar 7 13:39:01 2014 -0700 > Commit: Bjorn Helgaas > CommitDate: Wed Mar 19 15:00:16 2014 -0600 > > PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() > > When allocating space from a bus resource, i.e., from apertures leading to > this bus, make sure the entire resource type matches. The previous code > assumed the IORESOURCE_TYPE_BITS field was a bitmask with only a single > bit > set, but this is not true. IORESOURCE_TYPE_BITS is really an enumeration, > and we have to check all the bits. > > See 72dcb1197228 ("resources: Add register address resource type"). > > No functional change. If we used this path for allocating IRQs, DMA > channels, or bus numbers, this would fix a bug because those types are > indistinguishable when masked by IORESOURCE_IO | IORESOURCE_MEM. But we > don't, so this shouldn't make any difference. > > Signed-off-by: Bjorn Helgaas Thanks (I think). I'm afraid I'm going to need some more help to debug this. I built aa11fc58dc with the config you supplied and booted it on qemu with no real issues (it didn't boot all the way because the config doesn't include a driver for my root disk, but that's to be expected). The dmesg you supplied is for some other commit 2d18516 that I don't have, so I'm confused about why it's not from aa11fc58dc. I did reproduce what appears to be basically the same problem with a654dc797f3e, which is the 20140320 linux-next tree. I backed up to 93ecdc077282, which is where pci/next was merged (this includes aa11fc58dc), but I could not reproduce the problem there. So bottom line, I'm confused because your bisection doesn't match what I'm seeing, and I don't want to spend more time flailing around. Bjorn > ++++ > | >| aa11fc58dc | 2d18516523 | > ++++ > | boot_successes >| 19 | 0 | > | boot_failures >| 1 | 19 | > | BUG:unable_to_handle_kernel_NULL_pointer_dereference >| 1 | 1 | > | Oops >| 1 | 1 | > | EIP_is_at_rapl_pmu_init >| 1 | 1 | > | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= >| 1 | 1 | > | backtrace:rapl_pmu_init >| 1 | 1 | > | backtrace:kernel_init_freeable >| 1 | 19 | > | > WARNING:CPU:PID:at_drivers/gpu/drm/drm_crtc.c:drm_warn_on_modeset_not_all_locked() > | 0 | 18 | > | > WARNING:CPU:PID:at_drivers/gpu/drm/drm_crtc_helper.c:drm_helper_encoder_in_use() >| 0 | 18 | > | > WARNING:CPU:PID:at_drivers/gpu/drm/drm_crtc_helper.c:drm_helper_crtc_in_use() > | 0 | 18 | > | > WARNING:CPU:PID:at_drivers/gpu/drm/drm_crtc_helper.c:drm_helper_probe_single_connector_modes() > | 0 | 18 | > | WARNING:CPU:PID:at_drivers/gpu/drm/drm_modes.c:drm_mode_probed_add() >| 0 | 18 | > | > WARNING:CPU:PID:at_drivers/gpu/drm/drm_modes.c:drm_mode_connector_list_update() > | 0 | 18 | > | backtrace:drm_helper_disable_unused_functions >| 0 | 18 | > | backtrace:cirrus_fbdev_init >| 0 | 18 | > | backtrace:cirrus_modeset_init >| 0 | 18 | > | backtrace:__pci_register_driver >| 0 | 18 | > | backtrace:drm_pci_init >| 0 | 18 | > | backtrace:cirrus_init