Re: FW: ccid2/ccid3 oopses
| > So maybe the cause triggering this oops is somewhere else. | | yes, probably. sorry - i didn`t tell or maybe i didn`t know when writing | my first mail to module authors and forget to add that before forwarding here. | | for me , the problem does not happen with suse kernel of the day | (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens | with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached) | There are 256 differences between the two .config files. I think there are other people on the list who will be able to give more information regarding the .config files. The differences that struck me in the one which doesn't work is -- CONFIG_DEBUG_KERNEL and -- CONFIG_DEBUG_BUGVERBOSE were not set. Both are very useful for bug-hunting, the latter is much better for decoding oopses. Can't say anything about the Suse kernel. We use the plain kernel from www.kernel.org, specifically the netdev-tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 If you can't get further here, try with a kernel.org kernel or check Suse forums. 1. the tests yesterday were done on the DCCP test tree based on the above netdev-2.6 2.6.24-rc7 tree from git://eden-feed.erg.abdn.ac.uk/dccp_exp (dccp subtree) Tested your for-loop 60 seconds each for CCID3/4 -- no oops. 2. also repeated the tests on an unmodified 2.6.24-rc7 tree from netdev-2.6 (today) 120 seconds for-loop each -- no oops. As said, if the above does not help, try a www.kernel.org kernel (or one of the above trees) first. | | >| > >> the easiest way to reproduce is: | > | > >> | > | > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done | > | > >> after short time, the kernel oopses (messages below) | > | > >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FW: ccid2/ccid3 oopses
| > >> the easiest way to reproduce is: | > >> | > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done | > >> after short time, the kernel oopses (messages below) | > >> | | Gerrit, the control socket isn't attached to any CCID module, so the | CCID modules should be safe to remove, and IIRC they were safe to | unload. | Ah, right. I have misread the email. And can confirm the above: running the for-loop at the top of the message (60 seconds uninterrupted for CCID2,3 each) brought no oopses. So maybe the cause triggering this oops is somewhere else. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FW: ccid2/ccid3 oopses
Em Wed, Jan 09, 2008 at 12:28:27PM +, Gerrit Renker escreveu: > Roland, - > > >> apparently, i got crashes when loading/unloading other driver modules just > >> after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at > >> all, just modprobe module;modprobe -r module) > > >> > > >> the easiest way to reproduce is: > >> > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done > >> after short time, the kernel oopses (messages below) > >> > >> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m > >> contacting > >> you personally first. > >> > The issue is known: once loaded, the DCCP modules can not be unloaded > without causing a crash as the one you have observed. This is due to the > fact that dccp_ipv{4,6} use control sockets which need to be released > before the module can be unloaded. > When the control sockets are not released then crashes will always > result. > In earlier versions of DCCP there was a kernel option known as "unload hack", > which conditionally inserted > sock_release(dccp_v{4,6}_ctl_socket); > in > dccp_v{4,6}_exit() > > However, as the name says, it is a hack since there are other issues to > be considered: > * sockets in timewait state > * other wait states (e.g. half-open connections) > * memory which has not been released > * module dependencies > > With regard to the latter, I am normally using the Unload Hack and > release modules in the following order: > > dccp_probe => dccp_ccid2 => dccp_ccid3 => dccp_tfrc_lib => > dccp_ipv6 => dccp_ipv4 => dccp_diag => dccp > > Long story short > * the CCID/DCCP modules can currently not safely be unloaded > * maybe we should disable module unloading for the mainline kernel > * if anyone is interested to use the unload hack, here is the old patch >http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff Gerrit, the control socket isn't attached to any CCID module, so the CCID modules should be safe to remove, and IIRC they were safe to unload. The unload hack was for something else, for the core DCCP modules. We can't unload because there are refcounts held by the control sock, so the unload hack would just destroy the control sock and thus the module refcount would reach zero and it could then be unloaded. I've been consistently being sidetracked with work (huh :-)) and couldn't look at this issue, but the CCID modules should be safe to unload. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FW: ccid2/ccid3 oopses
Roland, - >> apparently, i got crashes when loading/unloading other driver modules just >> after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at >> all, just modprobe module;modprobe -r module) > >> >> the easiest way to reproduce is: >> >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done >> after short time, the kernel oopses (messages below) >> >> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m >> contacting >> you personally first. >> The issue is known: once loaded, the DCCP modules can not be unloaded without causing a crash as the one you have observed. This is due to the fact that dccp_ipv{4,6} use control sockets which need to be released before the module can be unloaded. When the control sockets are not released then crashes will always result. In earlier versions of DCCP there was a kernel option known as "unload hack", which conditionally inserted sock_release(dccp_v{4,6}_ctl_socket); in dccp_v{4,6}_exit() However, as the name says, it is a hack since there are other issues to be considered: * sockets in timewait state * other wait states (e.g. half-open connections) * memory which has not been released * module dependencies With regard to the latter, I am normally using the Unload Hack and release modules in the following order: dccp_probe => dccp_ccid2 => dccp_ccid3 => dccp_tfrc_lib => dccp_ipv6 => dccp_ipv4 => dccp_diag => dccp Long story short * the CCID/DCCP modules can currently not safely be unloaded * maybe we should disable module unloading for the mainline kernel * if anyone is interested to use the unload hack, here is the old patch http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff Please feel free to come back on this issue Gerrit -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
FW: ccid2/ccid3 oopses
Hello ! as suggested by Ian McDonald, i`m forwarding this to dccp and netdev mailing lists. > hi ! > > i know dccp_ccid2 and ccid3 modules are considered experimental, but i`m > unsure if i probably triggered a bug inside or outside that modules here (i`m > no kernel developer) > > apparently, i got crashes when loading/unloading other driver modules just > after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at > all, just modprobe module;modprobe -r module) > > this was detected during some hardcore module load/unload testing session and > apparently these modules seem to be the root cause of other modules crashing, > so they seem to leave the system in an inconsistent state after load/unload. > > this can be reproduced with recent 2.6.24rc6 kernel which was mostly built > with allmodconfig. > i could not reproduce this with a more minimalistic configuration, e.g. the > suse kernel of the day runs fine. > > the easiest way to reproduce is: > > while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done > after short time, the kernel oopses (messages below) > > i`m not sure if this is worth to be filed at kernel bugzilla, so i`m > contacting you personally first. > > i`d happily assist in helping debug this or provide more input (.config etc) > if you want to take a look. > > regards > Roland > > > [ 2322.177054] CCID: Unregistered CCID 2 (ccid2) > [ 2322.377927] CCID: Registered CCID 2 (ccid2) > > [ 2322.413793] BUG: unable to handle kernel paging request at virtual address > 4864 > [ 2322.425066] printing eip: c01792e1 *pde = > [ 2322.431523] Oops: [#1] SMP > [ 2322.435249] Modules linked in: dccp_ccid2 dccp edd iptable_filter > ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class > fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy > ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr > container thermal piix generic i2c_piix4 processor button ac i2c_core > power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg > ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd > uhci_hcd scsi_mod usbcore > [ 2322.489115] > [ 2322.491535] Pid: 1730, comm: kjournald Not tainted (2.6.24-rc6 #4) > [ 2322.497266] EIP: 0060:[] EFLAGS: 00010002 CPU: 0 > [ 2322.503205] EIP is at kmem_cache_alloc+0x5d/0xa6 > [ 2322.508789] EAX: EBX: 0282 ECX: c03750a0 EDX: 4864 > [ 2322.514864] ESI: c1408314 EDI: 4864 EBP: c03750a0 ESP: df9cfe94 > [ 2322.521110] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 > [ 2322.527346] Process kjournald (pid: 1730, ti=df9ce000 task=deaf31a0 > task.ti=df9ce000) > [ 2322.535722] Stack: c016094d c1408314 4864 00011200 df4032a0 df408e40 > def45000 000f > [ 2322.545350]00011210 c016094d 0010 e08cb15f dead9c00 > df161ab8 0004 > [ 2322.556833] def45000 000f c019acdf > df402940 0010 > [ 2322.565168] Call Trace: > [ 2322.568637] [] mempool_alloc+0x24/0xc2 > [ 2322.573169] [] mempool_alloc+0x24/0xc2 > [ 2322.577175] [] __journal_file_buffer+0x9b/0x11c [jbd] > [ 2322.585033] [] bio_alloc_bioset+0x8c/0xe6 > [ 2322.589301] [] bio_alloc+0xb/0x17 > [ 2322.593309] [] submit_bh+0x6e/0xf8 > [ 2322.597358] [] journal_commit_transaction+0x6de/0xbe8 [jbd] > [ 2322.605109] [] lock_timer_base+0x19/0x35 > [ 2322.610478] [] kjournald+0xae/0x1dd [jbd] > [ 2322.616182] [] autoremove_wake_function+0x0/0x33 > [ 2322.621341] [] kjournald+0x0/0x1dd [jbd] > [ 2322.628588] [] kthread+0x38/0x60 > [ 2322.633306] [] kthread+0x0/0x60 > [ 2322.637365] [] kernel_thread_helper+0x7/0x10 > [ 2322.645002] === > [ 2322.649049] Code: 3e 85 ff 89 7c 24 08 75 1b 89 14 24 8b 54 24 0c 83 c9 ff > 89 e8 89 74 24 04 e8 2b fb ff ff 89 44 24 08 eb 0c 8b 54 24 08 8b 46 0c <8b> > 04 82 89 06 89 d8 50 9d 0f 1f 84 00 00 00 00 00 66 83 7c 24 > [ 2322.673340] EIP: [] kmem_cache_alloc+0x5d/0xa6 SS:ESP > 0068:df9cfe94 > [ 2322.681327] ---[ end trace 35dbcab07ee48cc5 ]--- > [ 2322.737700] [ cut here ] > [ 2322.748822] Kernel BUG at c0199e6d [verbose debug info unavailable] > [ 2322.755960] invalid opcode: [#2] SMP > [ 2322.760773] Modules linked in: dccp_ccid2 dccp edd iptable_filter > ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class > fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy > ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr > container thermal piix generic i2c_piix4 processor button ac i2c_core > power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg > ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd > uhci_hcd scsi_mod usbcore > [ 2322.813338] > [ 2322.817134] Pid: 3125, comm: klogd Tainted: G D (2.6.24-rc6 #4) > [ 2322.821416] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 > [ 2322.828832] EIP