Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/24/2010 06:37 AM, Jan Beulich wrote: "Justin P. Mattock" 03.02.10 02:43>>> Could you try this simple patch (against plain 2.6.33-rc8)? Thanks, Jan --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -82,6 +82,9 @@ enum fixed_addresses { #endif FIX_DBGP_BASE, FIX_EARLYCON_MEM_BASE, +#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT + FIX_OHCI1394_BASE, +#endif #ifdef CONFIG_X86_LOCAL_APIC FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */ #endif @@ -126,9 +129,6 @@ enum fixed_addresses { FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - (__end_of_permanent_fixed_addresses& 255), FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, -#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT - FIX_OHCI1394_BASE, -#endif #ifdef CONFIG_X86_32 FIX_WP_TEST, #endif heres the bug report on this.. http://bugzilla.kernel.org/show_bug.cgi?id=14487 Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 03.02.10 02:43 >>> Could you try this simple patch (against plain 2.6.33-rc8)? Thanks, Jan --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -82,6 +82,9 @@ enum fixed_addresses { #endif FIX_DBGP_BASE, FIX_EARLYCON_MEM_BASE, +#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT + FIX_OHCI1394_BASE, +#endif #ifdef CONFIG_X86_LOCAL_APIC FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */ #endif @@ -126,9 +129,6 @@ enum fixed_addresses { FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - (__end_of_permanent_fixed_addresses & 255), FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, -#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT - FIX_OHCI1394_BASE, -#endif #ifdef CONFIG_X86_32 FIX_WP_TEST, #endif -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 02/21/2010 01:42 PM, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (122 days old) References : http://lkml.org/lkml/2009/10/23/252 Handled-By : Jan Beulich -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ yeah still here.. worst is I'm able to see this with suse11.2 as well as with my custom system. so please leave open. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On Sunday 21 February 2010, Justin P. mattock wrote: > On 02/21/2010 01:42 PM, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.31 and 2.6.32. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let the tracking team know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 > > Subject : PANIC: early exception 08 rip 246:10 error > > 810251b5 cr2 0 > > Submitter : Justin P. Mattock > > Date: 2009-10-23 16:45 (122 days old) > > References : http://lkml.org/lkml/2009/10/23/252 > > Handled-By : Jan Beulich > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > > yeah still here.. worst is I'm able to see this with > suse11.2 as well as with my custom system. > > so please leave open. Thanks for the update. Rafael -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (122 days old) References : http://lkml.org/lkml/2009/10/23/252 Handled-By : Jan Beulich -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (115 days old) References : http://lkml.org/lkml/2009/10/23/252 Handled-By : Jan Beulich -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On Monday 08 February 2010, Justin P. Mattock wrote: > On 02/07/10 16:28, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.31 and 2.6.32. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let the tracking team know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 > > Subject : PANIC: early exception 08 rip 246:10 error > > 810251b5 cr2 0 > > Submitter : Justin P. Mattock > > Date: 2009-10-23 16:45 (108 days old) > > References : http://lkml.org/lkml/2009/10/23/252 > > Handled-By : Jan Beulich > > Patch : http://patchwork.kernel.org/patch/68719/ > > > > > > > > > the patch attached to the bug report > makes my machine boot up with out a > Panic, and allows me to do remote debugging > via ohci1394_dma. > > I did see a call trace as I was debugging > which might be related to having one > system using the patch, and the other not. > but still need to look at that. > (only saw this once out of numerous boots > (could be a rarity)). Thanks for the update. Rafael -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 02/07/10 16:28, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (108 days old) References : http://lkml.org/lkml/2009/10/23/252 Handled-By : Jan Beulich Patch : http://patchwork.kernel.org/patch/68719/ the patch attached to the bug report makes my machine boot up with out a Panic, and allows me to do remote debugging via ohci1394_dma. I did see a call trace as I was debugging which might be related to having one system using the patch, and the other not. but still need to look at that. (only saw this once out of numerous boots (could be a rarity)). Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (108 days old) References : http://lkml.org/lkml/2009/10/23/252 Handled-By : Jan Beulich Patch : http://patchwork.kernel.org/patch/68719/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/04/10 01:57, Jan Beulich wrote: "Justin P. Mattock" 04.02.10 10:48>>> I see: ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); then I think it calls: set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); I'm guessing somewhere with the fix_to_virt might be something (but could be wrong); No, it ought to be that set_fixmap_nocache(). Jan looking into fixmap.h I started to look into: #define NR_FIX_BTMAPS 64 #define FIX_BTMAPS_SLOTS4 FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 - (__end_of_permanent_fixed_addresses & 255), FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1, which led me to a patch you had submitted: http://patchwork.kernel.org/patch/68719/ and another located here: http://lists.openwall.net/linux-kernel/2008/08/29/211 your patch works, I reapplied it to the latest HEAD, and added a bisected-and-tested-by unto it and sent it as an attachment to the bug report. the other thread(patch) I was able to get the system boot with that, as well but with it only changed the size of page(256 to 512 etc..). Let me know what would be the best approach with this. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/04/10 01:57, Jan Beulich wrote: "Justin P. Mattock" 04.02.10 10:48>>> I see: ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); then I think it calls: set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); I'm guessing somewhere with the fix_to_virt might be something (but could be wrong); No, it ought to be that set_fixmap_nocache(). Jan hmm.. as a quick test I did try: set_fixmap(FIX_OHCI1394_BASE, ohci_base); (maybe ohci_base) which still hit, maybe something else in the set of calls is hitting i.g. address specific or something. (I'll have to keep looking on this); Justin P. mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 04.02.10 10:48 >>> >I see: > >ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); > >then I think it calls: > >set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); > >I'm guessing somewhere with the fix_to_virt might be something >(but could be wrong); No, it ought to be that set_fixmap_nocache(). Jan -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/04/10 01:35, Jan Beulich wrote: "Justin P. Mattock" 04.02.10 10:17>>> so something is using __native_set_fixmap that's hitting some memory address then set_fixmap_nocache(ohci1394_dma=early) fires off hitting the same? No, afaict it is the ohci1394_dma=early code itself hitting that path. Jan alright.. looking at init_ohci1394_dma.c I see: ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); then I think it calls: set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); I'm guessing somewhere with the fix_to_virt might be something (but could be wrong); Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 04.02.10 10:17 >>> >so something is using __native_set_fixmap >that's hitting some memory address then >set_fixmap_nocache(ohci1394_dma=early) >fires off hitting the same? No, afaict it is the ohci1394_dma=early code itself hitting that path. Jan -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/04/10 01:11, Jan Beulich wrote: "Justin P. Mattock" 04.02.10 10:04>>> a quick google on this showed somewhere at bootmem.c any ideas on this or where this might be caused besides fixmap? (or is fixmap the main location?); __native_set_fixmap() -> set_pte_vaddr() -> set_pte_vaddr_pud() -> fill_pte() -> spp_getpage() -> alloc_bootmem_pages() -> panic(). Jan so something is using __native_set_fixmap that's hitting some memory address then set_fixmap_nocache(ohci1394_dma=early) fires off hitting the same? Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 04.02.10 10:04 >>> >a quick google on this showed somewhere >at bootmem.c any ideas on this or where >this might be caused besides fixmap? >(or is fixmap the main location?); __native_set_fixmap() -> set_pte_vaddr() -> set_pte_vaddr_pud() -> fill_pte() -> spp_getpage() -> alloc_bootmem_pages() -> panic(). Jan -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/04/10 00:54, Jan Beulich wrote: "Justin P. Mattock" 04.02.10 00:05>>> [ 0.00] 01 - 014000 page 2M [ 0.00] kernel direct mapping tables up to 14000 @ b000-11000 [ 0.00] init_ohci1394_dma: initializing OHCI-1394 at 05:00.0 [ 0.00] bootmem alloc of 4096 bytes failed! [ 0.00] Kernel panic - not syncing: Out of memory [ 0.00] Pid: 0, comm: swapper Not tainted 2.6.33-rc6-00072-gab65832 # 39 [ 0.00] Call Trace: then the rest shown on the picture on the bug report. Out of memory? bootmem allocation before bootmem was even initialized. And that's likely because the code tries to populate the pmd that (due to the issue explained yesterday) isn't statically initialized. Jan I'll have a look at this in the morning(late over here), but one thing I'm seeing is the device numbers: the error shows 05:00.0 while on a good go of this I saw the address at **3** something(can grab the info later for you). which probably goes to what you are saying: tries to populate the pmd a quick google on this showed somewhere at bootmem.c any ideas on this or where this might be caused besides fixmap? (or is fixmap the main location?); Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 04.02.10 00:05 >>> >[ 0.00] 01 - 014000 page 2M >[ 0.00] kernel direct mapping tables up to 14000 @ b000-11000 >[ 0.00] init_ohci1394_dma: initializing OHCI-1394 at 05:00.0 >[ 0.00] bootmem alloc of 4096 bytes failed! >[ 0.00] Kernel panic - not syncing: Out of memory >[ 0.00] Pid: 0, comm: swapper Not tainted >2.6.33-rc6-00072-gab65832 # 39 >[ 0.00] Call Trace: > >then the rest shown on the picture on the bug report. > >Out of memory? bootmem allocation before bootmem was even initialized. And that's likely because the code tries to populate the pmd that (due to the issue explained yesterday) isn't statically initialized. Jan -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
o.k. while looking into grub2 I had noticed during compile time and reading posts, that -m32 is always being called(no matter how much I tweaked the Makefile). seeing this made me think well if this thing is being built with -m32 maybe that might be it i.g. 32bit to 64bit might cause some issues, but unfortunately is not the case(building lilo you can achieve a pure64 bit build). So after all of that still no go, but the positive side is lilo is able to show more up the line of the boot message error: [ 0.00] 01 - 014000 page 2M [ 0.00] kernel direct mapping tables up to 14000 @ b000-11000 [ 0.00] init_ohci1394_dma: initializing OHCI-1394 at 05:00.0 [ 0.00] bootmem alloc of 4096 bytes failed! [ 0.00] Kernel panic - not syncing: Out of memory [ 0.00] Pid: 0, comm: swapper Not tainted 2.6.33-rc6-00072-gab65832 # 39 [ 0.00] Call Trace: then the rest shown on the picture on the bug report. Out of memory? Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
jan, Thanks for that info, after looking at arch/x86/kernel/head_64.S I'm thinking this is a grub2 issue. From what I remember while building this system I used ubuntu as the host, built grub2 from git then once being able to boot, figured it was all good. Just to see, I'll go and leave the kernel as it is, build grub2 again, just to make sure. (maybe there's something happening with it because grub2 is built pure64, anything 32bit wont work (could be wrong though)) i.g. if the kernel does 32bit something then changes to 64bit and in the meantime grub2 can only see 64bit then maybe this is what I'm hitting. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/03/10 01:18, Jan Beulich wrote: "Justin P. Mattock" 03.02.10 02:43>>> The only thing I can think of at this point is maybe the CFLAGS I used to build this system. (as for the x86_32 working and x86_64 failing not sure); I'm curious to see if anybody else is hitting this? I think it is pretty clear how a page fault can happen here (but you're observing a double fault, which I cannot explain [nor can I explain why the fault apparently didn't get an error code pushed, which is why address and error code displayed are mixed up]): I would suspect that FIX_OHCI1394_BASE is now in a different (virtual) 2Mb range than what is covered by level{1,2}_fixmap_pgt, but this was a latent issue even before that patch (just waiting for sufficiently many fixmap entries getting inserted before __end_of_permanent_fixed_addresses). The thing is that head_64.S uses hard-coded numbers, but doesn't really make sure (at build time) that the fixmap page tables established indeed cover all the entries of importance (and honestly I even can't easily tell which of the candidates - FIX_DBGP_BASE, FIX_EARLYCON_MEM_BASE, and FIX_OHCI1394_BASE afaict - really matter). If either of the first does, the only reasonable solution imo is to move FIX_OHCI1394_BASE out of the boot time only range into the permanent range (unless the other two can be moved into the boot time only range). And obviously the hard coded numbers should be eliminated from head_64.S. Jan Thanks for your info on this. I can try today moving things around just to see. Looking more into this(keep in mind I have no idea how these page,fix_to_virt calls etc.. work) I was thinking with what stefan had mentioned ___alloc_bootmem_node (still need to look into what that function does)maybe keeping fixmap.h as is and looking somewhere else might be where the fix might be(but could be wrong). In any case I'll have another go at this today. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
>>> "Justin P. Mattock" 03.02.10 02:43 >>> >The only thing I can think of at this point >is maybe the CFLAGS I used to build this system. >(as for the x86_32 working and x86_64 failing not sure); > >I'm curious to see if anybody else is hitting this? I think it is pretty clear how a page fault can happen here (but you're observing a double fault, which I cannot explain [nor can I explain why the fault apparently didn't get an error code pushed, which is why address and error code displayed are mixed up]): I would suspect that FIX_OHCI1394_BASE is now in a different (virtual) 2Mb range than what is covered by level{1,2}_fixmap_pgt, but this was a latent issue even before that patch (just waiting for sufficiently many fixmap entries getting inserted before __end_of_permanent_fixed_addresses). The thing is that head_64.S uses hard-coded numbers, but doesn't really make sure (at build time) that the fixmap page tables established indeed cover all the entries of importance (and honestly I even can't easily tell which of the candidates - FIX_DBGP_BASE, FIX_EARLYCON_MEM_BASE, and FIX_OHCI1394_BASE afaict - really matter). If either of the first does, the only reasonable solution imo is to move FIX_OHCI1394_BASE out of the boot time only range into the permanent range (unless the other two can be moved into the boot time only range). And obviously the hard coded numbers should be eliminated from head_64.S. Jan -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
o.k. finally finished with the bisect: reverting this gets things going on 2.6.33-rc5 789d03f584484af85dbdc64935270c8e45f36ef7 is the first bad commit commit 789d03f584484af85dbdc64935270c8e45f36ef7 Author: Jan Beulich Date: Tue Jun 30 11:52:23 2009 +0100 x86: Fix fixmap ordering The merge of the 32- and 64-bit fixmap headers made a latent bug on x86-64 a real one: with the right config settings it is possible for FIX_OHCI1394_BASE to overlap the FIX_BTMAP_* range. Signed-off-by: Jan Beulich Cc: # for 2.6.30.x LKML-Reference: <4a4a0a8702788...@vpn.id2.novell.com> Signed-off-by: Ingo Molnar The only thing I can think of at this point is maybe the CFLAGS I used to build this system. (as for the x86_32 working and x86_64 failing not sure); I'm curious to see if anybody else is hitting this? Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/01/10 22:57, Stefan Richter wrote: Stefan Richter wrote: Do I understand correctly that at this moment it is only known that the bug could be - *either* a 2.6.31 -> 2.6.32 regression - *or* an x86-64 specific bug that does not occur on x86-32, right? (OK, according to your other post it /is/ a regression, at least on x86-64 and definitely between 2.6.27 (good) and 2.6.32 (bad).) I'll go with the bisect in the morning(late over here), and then go from there.(just pissed at myself for not thinking to do this at the beginning). Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/01/10 22:55, Stefan Richter wrote: Justin P. Mattock wrote: As for anything changed in the kernel (2.6.31 - present), tough to say from what I remember I had created a new fresh lfs system using these CFLAGS: CFLAGS="-mtune=core2 -march=core2 -O2 -pipe -fomit-frame-pointer" CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}" (without -m option gcc defaults(I think)to -m32). which booted with ohci1394_dma=early just fine. then decided to build another lfs system with the same CFLAGS except added -m64 (pure64) to the build process. (then this showed up). What I can try is do a git revert to 2.6.29/27 to see if this thing fires off(before going any further). if the system boots then do a bisect. Do I understand correctly that at this moment it is only known that the bug could be - *either* a 2.6.31 -> 2.6.32 regression - *or* an x86-64 specific bug that does not occur on x86-32, right? at first I was under the impression this was an arch thing because of building an x86_32, and then building x86_64(and hitting this). but now after reverting to 2.6.27 I'm thinking other wise.(my bad, should of done this at first but didn't even think too); I have an Core 2 Duo based PC with x86-32 kernel and userland and an AMD based x86-64 PC and could give ohci1394_dma=early a try on both (never tested it myself before). I could furthermore attempt to build and install an x86-64 kernel on the Core 2 Duo PC but I am afraid I am far too short of spare time for that. no.. I need to do a bisect from 2.6.27 to present to see (just need to crash for a few hrs, then can start); then I'll go from there. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
Stefan Richter wrote: > Do I understand correctly that at this moment it is only known that the > bug could be > - *either* a 2.6.31 -> 2.6.32 regression > - *or* an x86-64 specific bug that does not occur on x86-32, > right? (OK, according to your other post it /is/ a regression, at least on x86-64 and definitely between 2.6.27 (good) and 2.6.32 (bad).) -- Stefan Richter -=-==-=- --=- ---=- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
Justin P. Mattock wrote: > As for anything changed in the kernel > (2.6.31 - present), tough to say > from what I remember I had created a new fresh > lfs system using these CFLAGS: > > CFLAGS="-mtune=core2 -march=core2 -O2 -pipe -fomit-frame-pointer" > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}" > (without -m option gcc defaults(I think)to -m32). > > which booted with ohci1394_dma=early just fine. > > then decided to build another lfs system with the same CFLAGS except > added -m64 (pure64) to the build process. > (then this showed up). > > What I can try is do a git revert to 2.6.29/27 to see if this thing > fires off(before going any further). if the system boots then do a bisect. Do I understand correctly that at this moment it is only known that the bug could be - *either* a 2.6.31 -> 2.6.32 regression - *or* an x86-64 specific bug that does not occur on x86-32, right? I have an Core 2 Duo based PC with x86-32 kernel and userland and an AMD based x86-64 PC and could give ohci1394_dma=early a try on both (never tested it myself before). I could furthermore attempt to build and install an x86-64 kernel on the Core 2 Duo PC but I am afraid I am far too short of spare time for that. -- Stefan Richter -=-==-=- --=- ---=- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
o.k. I feel really stupid right now. after starring at this for some time I didn't even think to do a git revert to test other kernel versions(duh!!). so doing a git revert to v2.6.27 ohci1394_dma boots up fine. a bit late now to do a bisect, but in the morning I'll start this and see what I get from it, then go from there. (man!! let this be a lesson for me); Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/01/10 21:45, Stefan Richter wrote: Justin P. Mattock wrote: So(correct me if I'm wrong), I'm generating a 64 bit register and the kernel is looking for a 32 bit register causing the crash. No, the class = read_pci_config(); if (class == ...) ... parts of the code are entirely innocent as far as I can tell. This is just the FireWire--PCI chip detection. It is the subsequent driver setup for the chip that crashes somewhere. When you modified that chip detection code earlier, you only prevented crashes when your modifications ended up as "ignore all PCI devices, also FireWire ones" == "do nothing at all". Perhaps the bootup sequence of the x86(-64) platform was changed from 2.6.31 to .32 thus that some assumptions in init_ohci1394_dma about when are what resources available are not true anymore. According to your screenshot in http://lkml.org/lkml/2009/10/27/335 the issue is about memory allocation, not about PCI bus access. Alright.. I'll keep focus on that and see if I can figure this out. As for anything changed in the kernel (2.6.31 - present), tough to say from what I remember I had created a new fresh lfs system using these CFLAGS: CFLAGS="-mtune=core2 -march=core2 -O2 -pipe -fomit-frame-pointer" CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}" (without -m option gcc defaults(I think)to -m32). which booted with ohci1394_dma=early just fine. then decided to build another lfs system with the same CFLAGS except added -m64 (pure64) to the build process. (then this showed up). What I can try is do a git revert to 2.6.29/27 to see if this thing fires off(before going any further). if the system boots then do a bisect. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
Justin P. Mattock wrote: > So(correct me if I'm wrong), I'm generating a 64 bit register > and the kernel is looking for a 32 bit register causing the crash. No, the class = read_pci_config(); if (class == ...) ... parts of the code are entirely innocent as far as I can tell. This is just the FireWire--PCI chip detection. It is the subsequent driver setup for the chip that crashes somewhere. When you modified that chip detection code earlier, you only prevented crashes when your modifications ended up as "ignore all PCI devices, also FireWire ones" == "do nothing at all". Perhaps the bootup sequence of the x86(-64) platform was changed from 2.6.31 to .32 thus that some assumptions in init_ohci1394_dma about when are what resources available are not true anymore. According to your screenshot in http://lkml.org/lkml/2009/10/27/335 the issue is about memory allocation, not about PCI bus access. -- Stefan Richter -=-==-=- --=- ---=- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/01/10 14:27, Stefan Richter wrote: Justin P. Mattock wrote: (as for yesterdays 0x(just experimenting)Google gives me no info on the differences between 8f's to 16f's, I was under the impression that it's x86_32 and x86_64 for the pci address). As Dan noted, (class == 0x || 0x) is always true because it is logically the same as (class == whatever) || true If you really meant class == 0x || class == 0x yeah that's what I was going for(just to see). then the latter half will never become true because class is declared as u32 and got its value from read_pci_config() which also returns u32. That's what I was afraid of. I'm guessing there probably would be a lot of things to change for(if this correct) u64. BTW, whether a PCI device is capable of accessing 32 bit bus addresses or also 64 bit bus addresses depends on the device, not on the CPU. Originally, PCI only had a 32 bit addressing model. OHCI 1394 1.0/1.1 implementations only deal with 32 bit local bus addresses. I haven't even looked at what the device was capable of doing. The 'class' however is not an address but merely a register value with 24 bits width. (Defined in the PCI Local Bus spec which is not freely available, cited in OHCI 1394 annex A.3.) This register is read as a 32 bits wide register from which the excess byte is later discarded. If all bits read are 1, the bus:slot:function is not actually populated. So(correct me if I'm wrong), I'm generating a 64 bit register and the kernel is looking for a 32 bit register causing the crash. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
Justin P. Mattock wrote: > (as for yesterdays 0x(just experimenting)Google gives me > no info on the differences between 8f's to 16f's, I was under the > impression that it's x86_32 and x86_64 for the pci address). As Dan noted, (class == 0x || 0x) is always true because it is logically the same as (class == whatever) || true If you really meant class == 0x || class == 0x then the latter half will never become true because class is declared as u32 and got its value from read_pci_config() which also returns u32. BTW, whether a PCI device is capable of accessing 32 bit bus addresses or also 64 bit bus addresses depends on the device, not on the CPU. Originally, PCI only had a 32 bit addressing model. OHCI 1394 1.0/1.1 implementations only deal with 32 bit local bus addresses. The 'class' however is not an address but merely a register value with 24 bits width. (Defined in the PCI Local Bus spec which is not freely available, cited in OHCI 1394 annex A.3.) This register is read as a 32 bits wide register from which the excess byte is later discarded. If all bits read are 1, the bus:slot:function is not actually populated. -- Stefan Richter -=-==-=- --=- = http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
On 02/01/10 11:57, Stefan Richter wrote: Justin P. Mattock wrote: On 02/01/10 04:54, Dan Carpenter wrote: On Sun, Jan 31, 2010 at 05:39:22PM -0800, Justin P. Mattock wrote: On 01/31/10 16:43, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (101 days old) References : http://lkml.org/lkml/2009/10/23/252 [...] yeah still hitting this. [...] I've added the linux1394-devel people to the CC list. Thanks. Alas the original author is MIA, and the bug seems to be tied to the early platform setup code (rather than OHCI 1394 device specific code) about which I for one am clueless. The listed MAINTAINERS contact of init_ohci1394_dma.c is linux1394-devel and me, but a good deal of this driver is very x86 platform specific. (There was some interest in making useful for other architectures, but this would merely mean that the respective architecture people need to keep an eye on their parts of this driver.) Justin has found an issue that when he boots with: ohci1394_dma=early his computer crashes. He can get it to boot by modifying drivers/ieee1394/init_ohci1394_dma.c: [...] This modification and some others in the LKML thread from October simply cause init_ohci1394_controller() to be skipped for all devices. init_ohci1394_controller() is simple enough: static inline void __init init_ohci1394_controller(int num, int slot, int func) { unsigned long ohci_base; struct ti_ohci ohci; printk(KERN_INFO "init_ohci1394_dma: initializing OHCI-1394" " at %02x:%02x.%x\n", num, slot, func); ohci_base = read_pci_config(num, slot, func, PCI_BASE_ADDRESS_0+(0<<2))& PCI_BASE_ADDRESS_MEM_MASK; set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); init_ohci1394_reset_and_init_dma(&ohci); } Justin, you already established that read_pci_config is not the point where it crashes, right? set_fixmap_nocache() and fix_to_virt() frighten me because I don't know what they do. :-) The rest, init_ohci1394_reset_and_init_dma(), is something which I can easily follow. There is just a bunch of register reads and writes with occasional mdelays. This /could/ be a cause of the crash too if the controller is inspired to do something dangerous in there --- meaning, if the OHCI 1394 controller starts to write something per DMA into memory. However, we do not switch on any DMA context except for the so-called physical DMA unit which only springs into action if a remote FireWire-attached console instructs it to do so. I am noticing one point where init_ohci1394_dma.c violates the OHCI 1394 specification: OHCI1394_HCControl_linkEnable is witched on while the OHCI1394_ConfigROMmap register is still invalid. This register needs to contain a physical address of a 1kB sized, 1kB aligned memory region which allows DMA_TO_DEVICE. So, since this is a read-only DMA, I am tempted to say that this potential issue should not be a cause for a kernel crash. (Sinde note, the OHCI 1394 spec is freely available, see http://ieee1394.wiki.kernel.org/index.php/Specifications#OHCI_Release_1.1.2C_January_6.2C_2000 ) Justin Mattock wrote on 2009-10-27 in http://lkml.org/lkml/2009/10/27/335: o.k. you should be able to view this:(let me know if you can't and I can manually write out, and in time find a public photo sharing suite to make things easier). http://www.flickr.com/photos/44066...@n08/4050317695 When this happens I see lots of messages from the print during boot, then this happens. (Now that a bugzilla.kernel.org ticket exists for this you can also use bugzilla.kernel.org to publish screenshots if you have an account there.) This screenshot looks like ___alloc_bootmem_node is the issue here, or am I mistaken of what the order of functions in the backtrace means? cool, thanks for the assistance and info on this. (I'll have to read through the specification for ohci1394); as for __alloc_bootmem_node I have not looked into that yet. (I can read up on this today). what I was looking at was: set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); which led me to arch/x86/include/asm/fixmap.h leading me to believe I was hitting something with FIXADDR_TOP because the system is a pure64. (reading through fixmap.h there is mention that vsyscall only covers 32bit making me think this might be it). and also: init_ohci1394_reset_and_init_dma(&ohci); (on the bugreport I have a temporary patch which gets me up a
ohci1394_dma=early crash since 2.6.32 (was Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0)
Justin P. Mattock wrote: > On 02/01/10 04:54, Dan Carpenter wrote: >> On Sun, Jan 31, 2010 at 05:39:22PM -0800, Justin P. Mattock wrote: >>> On 01/31/10 16:43, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject: PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date : 2009-10-23 16:45 (101 days old) References : http://lkml.org/lkml/2009/10/23/252 [...] >>> yeah still hitting this. [...] >> I've added the linux1394-devel people to the CC list. Thanks. Alas the original author is MIA, and the bug seems to be tied to the early platform setup code (rather than OHCI 1394 device specific code) about which I for one am clueless. The listed MAINTAINERS contact of init_ohci1394_dma.c is linux1394-devel and me, but a good deal of this driver is very x86 platform specific. (There was some interest in making useful for other architectures, but this would merely mean that the respective architecture people need to keep an eye on their parts of this driver.) >> Justin has found an issue that when he boots with: ohci1394_dma=early >> his computer >> crashes. >> >> He can get it to boot by modifying drivers/ieee1394/init_ohci1394_dma.c: [...] This modification and some others in the LKML thread from October simply cause init_ohci1394_controller() to be skipped for all devices. init_ohci1394_controller() is simple enough: static inline void __init init_ohci1394_controller(int num, int slot, int func) { unsigned long ohci_base; struct ti_ohci ohci; printk(KERN_INFO "init_ohci1394_dma: initializing OHCI-1394" " at %02x:%02x.%x\n", num, slot, func); ohci_base = read_pci_config(num, slot, func, PCI_BASE_ADDRESS_0+(0<<2)) & PCI_BASE_ADDRESS_MEM_MASK; set_fixmap_nocache(FIX_OHCI1394_BASE, ohci_base); ohci.registers = (void *)fix_to_virt(FIX_OHCI1394_BASE); init_ohci1394_reset_and_init_dma(&ohci); } Justin, you already established that read_pci_config is not the point where it crashes, right? set_fixmap_nocache() and fix_to_virt() frighten me because I don't know what they do. :-) The rest, init_ohci1394_reset_and_init_dma(), is something which I can easily follow. There is just a bunch of register reads and writes with occasional mdelays. This /could/ be a cause of the crash too if the controller is inspired to do something dangerous in there --- meaning, if the OHCI 1394 controller starts to write something per DMA into memory. However, we do not switch on any DMA context except for the so-called physical DMA unit which only springs into action if a remote FireWire-attached console instructs it to do so. I am noticing one point where init_ohci1394_dma.c violates the OHCI 1394 specification: OHCI1394_HCControl_linkEnable is witched on while the OHCI1394_ConfigROMmap register is still invalid. This register needs to contain a physical address of a 1kB sized, 1kB aligned memory region which allows DMA_TO_DEVICE. So, since this is a read-only DMA, I am tempted to say that this potential issue should not be a cause for a kernel crash. (Sinde note, the OHCI 1394 spec is freely available, see http://ieee1394.wiki.kernel.org/index.php/Specifications#OHCI_Release_1.1.2C_January_6.2C_2000 ) Justin Mattock wrote on 2009-10-27 in http://lkml.org/lkml/2009/10/27/335: > o.k. you should be able to view > this:(let me know if you can't and I can > manually write out, and in time find a public > photo sharing suite to make things easier). > > http://www.flickr.com/photos/44066...@n08/4050317695 > > When this happens I see lots of messages from the print > during boot, then this happens. (Now that a bugzilla.kernel.org ticket exists for this you can also use bugzilla.kernel.org to publish screenshots if you have an account there.) This screenshot looks like ___alloc_bootmem_node is the issue here, or am I mistaken of what the order of functions in the backtrace means? -- Stefan Richter -=-==-=- --=- = http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 02/01/10 04:54, Dan Carpenter wrote: On Sun, Jan 31, 2010 at 05:39:22PM -0800, Justin P. Mattock wrote: On 01/31/10 16:43, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (101 days old) References : http://lkml.org/lkml/2009/10/23/252 yeah still hitting this. looking at the issue if I change: @@ 260 if ((class == 0x)) continue; to if ((class == 0x || 0x)) continue; Uh... 0x is always true so basically that's the same as deleting the if condition. I've added the linux1394-devel people to the CC list. Justin has found an issue that when he boots with: ohci1394_dma=early his computer crashes. He can get it to boot by modifying drivers/ieee1394/init_ohci1394_dma.c: init_ohci1394_dma_on_all_controllers() 254 /* Poor man's PCI discovery, the only thing we can do at early boot */ 255 for (num = 0; num< 32; num++) { 256 for (slot = 0; slot< 32; slot++) { 257 for (func = 0; func< 8; func++) { 258 u32 class = read_pci_config(num,slot,func, 259 PCI_CLASS_REVISION); 260 if ((class == 0x)) 261 continue; /* No device at this func */ If he continues here then his system boots. 262 263 if (class>>8 != PCI_CLASS_SERIAL_FIREWIRE_OHCI) 264 continue; /* Not an OHCI-1394 device */ 265 266 init_ohci1394_controller(num, slot, func); 267 break; /* Assume one controller per device */ This comment is not terribly clear btw. The code assumes one controller per slot. 268 } 269 } 270 } regards, dan carpenter I'm able to boot, but don't have enough knowledge to know what is really happening(or how to execute this). will continue looking at this (hopefully I get somewhere on this); Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ yeah I'll admit it, I don't know what I'm doing (but am willing to try). Thanks for the response, I'll try and give as much info on this as possible. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On Sun, Jan 31, 2010 at 05:39:22PM -0800, Justin P. Mattock wrote: > On 01/31/10 16:43, Rafael J. Wysocki wrote: >> This message has been generated automatically as a part of a report >> of regressions introduced between 2.6.31 and 2.6.32. >> >> The following bug entry is on the current list of known regressions >> introduced between 2.6.31 and 2.6.32. Please verify if it still should >> be listed and let me know (either way). >> >> >> Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=14487 >> Subject : PANIC: early exception 08 rip 246:10 error >> 810251b5 cr2 0 >> Submitter: Justin P. Mattock >> Date : 2009-10-23 16:45 (101 days old) >> References : http://lkml.org/lkml/2009/10/23/252 >> >> >> > > > yeah still hitting this. > looking at the issue if I change: > > @@ 260 > > if ((class == 0x)) > continue; > to > > if ((class == 0x || 0x)) > continue; > Uh... 0x is always true so basically that's the same as deleting the if condition. I've added the linux1394-devel people to the CC list. Justin has found an issue that when he boots with: ohci1394_dma=early his computer crashes. He can get it to boot by modifying drivers/ieee1394/init_ohci1394_dma.c: init_ohci1394_dma_on_all_controllers() 254 /* Poor man's PCI discovery, the only thing we can do at early boot */ 255 for (num = 0; num < 32; num++) { 256 for (slot = 0; slot < 32; slot++) { 257 for (func = 0; func < 8; func++) { 258 u32 class = read_pci_config(num,slot,func, 259 PCI_CLASS_REVISION); 260 if ((class == 0x)) 261 continue; /* No device at this func */ If he continues here then his system boots. 262 263 if (class>>8 != PCI_CLASS_SERIAL_FIREWIRE_OHCI) 264 continue; /* Not an OHCI-1394 device */ 265 266 init_ohci1394_controller(num, slot, func); 267 break; /* Assume one controller per device */ This comment is not terribly clear btw. The code assumes one controller per slot. 268 } 269 } 270 } regards, dan carpenter > I'm able to boot, but don't have enough knowledge to know > what is really happening(or how to execute this). > will continue looking at this > (hopefully I get somewhere on this); > > Justin P. Mattock > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 01/31/10 16:43, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (101 days old) References : http://lkml.org/lkml/2009/10/23/252 yeah still hitting this. looking at the issue if I change: @@ 260 if ((class == 0x)) continue; to if ((class == 0x || 0x)) continue; I'm able to boot, but don't have enough knowledge to know what is really happening(or how to execute this). will continue looking at this (hopefully I get somewhere on this); Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (101 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 01/24/10 14:22, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (94 days old) References : http://lkml.org/lkml/2009/10/23/252 yeah I'm still seeing this during boot. As of looking at this, been tied up with another issue and totally forgot. next week I'll be away for a week, and during that period I can try and look at this since I might be hanging around at times. (and wont be side tracked with the other issue I was looking at); So yeah please keep it open, and hopefully somebody see's what is happening and maybe has a solution, or by chance maybe I can figure something. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (94 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On Monday 11 January 2010, Justin P. Mattock wrote: > On 01/10/10 14:56, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.31 and 2.6.32. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 > > Subject : PANIC: early exception 08 rip 246:10 error > > 810251b5 cr2 0 > > Submitter : Justin P. Mattock > > Date: 2009-10-23 16:45 (80 days old) > > References : http://lkml.org/lkml/2009/10/23/252 > > > > > > > > I've played around with this. and > and much confused at what needs to happen. > (please feedback on what might be happening); > In any case I can have another try at finding a fix > so please leave open. I will, thanks for the update. Rafael -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On 01/10/10 14:56, Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (80 days old) References : http://lkml.org/lkml/2009/10/23/252 I've played around with this. and and much confused at what needs to happen. (please feedback on what might be happening); In any case I can have another try at finding a fix so please leave open. Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (80 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (68 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (30 days old) References : http://lkml.org/lkml/2009/10/23/252 yes It's still there.. I'm wondering because there's not much feedback on this maybe I should take a hint! or maybe it's something people just don't understand, and don't have time for.. In any case I'm still looking at it(although had to look at other reports as well). Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (30 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
On Tuesday 17 November 2009, Justin P. Mattock wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.31. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 > > Subject : PANIC: early exception 08 rip 246:10 error > > 810251b5 cr2 0 > > Submitter : Justin P. Mattock > > Date: 2009-10-23 16:45 (25 days old) > > References : http://lkml.org/lkml/2009/10/23/252 > > > > > > > > > This one has me a bit dazed i.g. after looking into the issue > I did find a workaround(keep in mind it's not pretty), > by commenting out set_fixmap_nocache and > init_ohci1394_reset_and_init_dma. > (by doing so I was able to load both machines and > execute early debugging in case a problem occurs). > > Now as to what might be happening, after going through as > much as I can comprehend the only thing in mind was > reading fixmap.h the comments are stating that vsyscalls > only covers 32bit, and that there needs to be another set > for 64, leading me to believe that this is what I might be hitting. > (my system is pure64, taking in no 32bit at all). > > At this point I think I need somebody to give me some info on this, > and if the 64bit issue mentioned above is the case, then we can probably > close this and leave it up to the x86_64 builders to create a 64bit > call for this whenever they get to it.(main thing is I'm able to > run dma early in case of an emergency). Thanks for the update. Rafael -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
Rafael J. Wysocki wrote: This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (25 days old) References : http://lkml.org/lkml/2009/10/23/252 This one has me a bit dazed i.g. after looking into the issue I did find a workaround(keep in mind it's not pretty), by commenting out set_fixmap_nocache and init_ohci1394_reset_and_init_dma. (by doing so I was able to load both machines and execute early debugging in case a problem occurs). Now as to what might be happening, after going through as much as I can comprehend the only thing in mind was reading fixmap.h the comments are stating that vsyscalls only covers 32bit, and that there needs to be another set for 64, leading me to believe that this is what I might be hitting. (my system is pure64, taking in no 32bit at all). At this point I think I need somebody to give me some info on this, and if the 64bit issue mentioned above is the case, then we can probably close this and leave it up to the x86_64 builders to create a 64bit call for this whenever they get to it.(main thing is I'm able to run dma early in case of an emergency). Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug #14487] PANIC: early exception 08 rip 246:10 error ffffffff810251b5 cr2 0
This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14487 Subject : PANIC: early exception 08 rip 246:10 error 810251b5 cr2 0 Submitter : Justin P. Mattock Date: 2009-10-23 16:45 (25 days old) References : http://lkml.org/lkml/2009/10/23/252 -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html