Re: FW: ccid2/ccid3 oopses

2008-01-10 Thread Gerrit Renker
|  So maybe the cause triggering this oops is somewhere else.
| 
| yes, probably.  sorry - i didn`t tell or maybe i didn`t know when writing
| my first mail to module authors and forget to add that before forwarding here.
| 
| for me , the problem does not happen with suse kernel of the day
| (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens
| with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached)
| 
There are 256 differences between the two .config files. I think there are other
people on the list who will be able to give more information regarding the 
.config
files. The differences that struck me in the one which doesn't work is

 -- CONFIG_DEBUG_KERNEL and
 -- CONFIG_DEBUG_BUGVERBOSE were not set. Both are very useful for bug-hunting,
the latter is much better for decoding oopses.

Can't say anything about the Suse kernel. We use the plain kernel from 
www.kernel.org, 
specifically the netdev-tree:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
If you can't get further here, try with a kernel.org kernel or check Suse 
forums.   

 1. the tests yesterday were done on the DCCP test tree based on the above 
netdev-2.6
2.6.24-rc7 tree from git://eden-feed.erg.abdn.ac.uk/dccp_exp   (dccp 
subtree)
Tested your for-loop 60 seconds each for CCID3/4 -- no oops.

 2. also repeated the tests on an unmodified 2.6.24-rc7 tree from netdev-2.6 
(today)
120 seconds for-loop each -- no oops.   

As said, if the above does not help, try a www.kernel.org kernel (or one of the
above trees) first.
| 
| |   the easiest way to reproduce is:
|  |   
|  |   while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
|  |   after short time, the kernel oopses (messages below)
|  |   
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-09 Thread Gerrit Renker
Roland, -

 apparently, i got crashes when loading/unloading other driver modules just
 after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at
 all, just modprobe module;modprobe -r module) 
 
snip
 the easiest way to reproduce is:
 
 while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
 after short time, the kernel oopses (messages below)
 
 i`m not sure if this is worth to be filed at kernel bugzilla, so i`m 
 contacting
 you personally first.

The issue is known: once loaded, the DCCP modules can not be unloaded
without causing a crash as the one you have observed. This is due to the
fact that dccp_ipv{4,6} use control sockets which need to be released
before the module can be unloaded.
When the control sockets are not released then crashes will always
result.
In earlier versions of DCCP there was a kernel option known as unload hack,
which conditionally inserted 
sock_release(dccp_v{4,6}_ctl_socket);
in 
dccp_v{4,6}_exit()

However, as the name says, it is a hack since there are other issues to 
be considered:
* sockets in timewait state
* other wait states (e.g. half-open connections)
* memory which has not been released
* module dependencies

With regard to the latter, I am normally using the Unload Hack and
release modules in the following order:

dccp_probe = dccp_ccid2 = dccp_ccid3 = dccp_tfrc_lib =
dccp_ipv6  = dccp_ipv4  = dccp_diag  = dccp

Long story short
 * the CCID/DCCP modules can currently not safely be unloaded
 * maybe we should disable module unloading for the mainline kernel
 * if anyone is interested to use the unload hack, here is the old patch
   http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff

Please feel free to come back on this issue
Gerrit
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-09 Thread Arnaldo Carvalho de Melo
Em Wed, Jan 09, 2008 at 12:28:27PM +, Gerrit Renker escreveu:
 Roland, -
 
  apparently, i got crashes when loading/unloading other driver modules just
  after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at
  all, just modprobe module;modprobe -r module) 
  
 snip
  the easiest way to reproduce is:
  
  while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
  after short time, the kernel oopses (messages below)
  
  i`m not sure if this is worth to be filed at kernel bugzilla, so i`m 
  contacting
  you personally first.
 
 The issue is known: once loaded, the DCCP modules can not be unloaded
 without causing a crash as the one you have observed. This is due to the
 fact that dccp_ipv{4,6} use control sockets which need to be released
 before the module can be unloaded.
 When the control sockets are not released then crashes will always
 result.
 In earlier versions of DCCP there was a kernel option known as unload hack,
 which conditionally inserted 
   sock_release(dccp_v{4,6}_ctl_socket);
 in 
   dccp_v{4,6}_exit()
 
 However, as the name says, it is a hack since there are other issues to 
 be considered:
   * sockets in timewait state
   * other wait states (e.g. half-open connections)
   * memory which has not been released
   * module dependencies
 
 With regard to the latter, I am normally using the Unload Hack and
 release modules in the following order:
 
   dccp_probe = dccp_ccid2 = dccp_ccid3 = dccp_tfrc_lib =
 dccp_ipv6  = dccp_ipv4  = dccp_diag  = dccp
 
 Long story short
  * the CCID/DCCP modules can currently not safely be unloaded
  * maybe we should disable module unloading for the mainline kernel
  * if anyone is interested to use the unload hack, here is the old patch
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff

Gerrit, the control socket isn't attached to any CCID module, so the
CCID modules should be safe to remove, and IIRC they were safe to
unload.

The unload hack was for something else, for the core DCCP modules. We
can't unload because there are refcounts held by the control sock, so
the unload hack would just destroy the control sock and thus the module
refcount would reach zero and it could then be unloaded.

I've been consistently being sidetracked with work (huh :-)) and
couldn't look at this issue, but the CCID modules should be safe to
unload.

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: ccid2/ccid3 oopses

2008-01-09 Thread Gerrit Renker
|   the easiest way to reproduce is:
|   
|   while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
|   after short time, the kernel oopses (messages below)
|   
snip
| 
| Gerrit, the control socket isn't attached to any CCID module, so the
| CCID modules should be safe to remove, and IIRC they were safe to
| unload.
| 
Ah, right. I have misread the email. And can confirm the above: running
the for-loop at the top of the message (60 seconds uninterrupted for
CCID2,3 each) brought no oopses.
So maybe the cause triggering this oops is somewhere else.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FW: ccid2/ccid3 oopses

2008-01-08 Thread devzero
Hello !

as suggested by Ian McDonald, i`m forwarding this to dccp and netdev mailing 
lists.


 hi !

 i know dccp_ccid2 and ccid3 modules are considered experimental, but i`m 
 unsure if i probably triggered a bug inside or outside that modules here (i`m 
 no kernel developer)

 apparently, i got crashes when loading/unloading other driver modules just 
 after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at 
 all, just modprobe module;modprobe -r module)

 this was detected during some hardcore module load/unload testing session and 
 apparently these modules seem to be the root cause of other modules crashing, 
 so they seem to leave the system in an inconsistent state after load/unload.

 this can be reproduced with recent 2.6.24rc6 kernel which was mostly built 
 with allmodconfig.
 i could not reproduce this with a more minimalistic configuration, e.g. the 
 suse kernel of the day runs fine.

 the easiest way to reproduce is:

 while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
 after short time, the kernel oopses (messages below)

 i`m not sure if this is worth to be filed at kernel bugzilla, so i`m 
 contacting you personally first.

 i`d happily assist in helping debug this or provide more input (.config etc) 
 if you want to take a look.

 regards
 Roland 


 [ 2322.177054] CCID: Unregistered CCID 2 (ccid2)
 [ 2322.377927] CCID: Registered CCID 2 (ccid2)

 [ 2322.413793] BUG: unable to handle kernel paging request at virtual address 
 4864
 [ 2322.425066] printing eip: c01792e1 *pde = 
 [ 2322.431523] Oops:  [#1] SMP
 [ 2322.435249] Modules linked in: dccp_ccid2 dccp edd iptable_filter 
 ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class 
 fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy 
 ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr 
 container thermal piix generic i2c_piix4 processor button ac i2c_core 
 power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg 
 ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd 
 uhci_hcd scsi_mod usbcore
 [ 2322.489115]
 [ 2322.491535] Pid: 1730, comm: kjournald Not tainted (2.6.24-rc6 #4)
 [ 2322.497266] EIP: 0060:[c01792e1] EFLAGS: 00010002 CPU: 0
 [ 2322.503205] EIP is at kmem_cache_alloc+0x5d/0xa6
 [ 2322.508789] EAX:  EBX: 0282 ECX: c03750a0 EDX: 4864
 [ 2322.514864] ESI: c1408314 EDI: 4864 EBP: c03750a0 ESP: df9cfe94
 [ 2322.521110]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 [ 2322.527346] Process kjournald (pid: 1730, ti=df9ce000 task=deaf31a0 
 task.ti=df9ce000)
 [ 2322.535722] Stack: c016094d c1408314 4864 00011200 df4032a0 df408e40 
 def45000 000f
 [ 2322.545350]00011210 c016094d 0010 e08cb15f  dead9c00 
 df161ab8 0004
 [ 2322.556833] def45000 000f  c019acdf  
 df402940 0010
 [ 2322.565168] Call Trace:
 [ 2322.568637]  [c016094d] mempool_alloc+0x24/0xc2
 [ 2322.573169]  [c016094d] mempool_alloc+0x24/0xc2
 [ 2322.577175]  [e08cb15f] __journal_file_buffer+0x9b/0x11c [jbd]
 [ 2322.585033]  [c019acdf] bio_alloc_bioset+0x8c/0xe6
 [ 2322.589301]  [c019ad44] bio_alloc+0xb/0x17
 [ 2322.593309]  [c0197688] submit_bh+0x6e/0xf8
 [ 2322.597358]  [e08ccdba] journal_commit_transaction+0x6de/0xbe8 [jbd]
 [ 2322.605109]  [c013095c] lock_timer_base+0x19/0x35
 [ 2322.610478]  [e08cf9e6] kjournald+0xae/0x1dd [jbd]
 [ 2322.616182]  [c0139985] autoremove_wake_function+0x0/0x33
 [ 2322.621341]  [e08cf938] kjournald+0x0/0x1dd [jbd]
 [ 2322.628588]  [c01398bc] kthread+0x38/0x60
 [ 2322.633306]  [c0139884] kthread+0x0/0x60
 [ 2322.637365]  [c0107beb] kernel_thread_helper+0x7/0x10
 [ 2322.645002]  ===
 [ 2322.649049] Code: 3e 85 ff 89 7c 24 08 75 1b 89 14 24 8b 54 24 0c 83 c9 ff 
 89 e8 89 74 24 04 e8 2b fb ff ff 89 44 24 08 eb 0c 8b 54 24 08 8b 46 0c 8b 
 04 82 89 06 89 d8 50 9d 0f 1f 84 00 00 00 00 00 66 83 7c 24
 [ 2322.673340] EIP: [c01792e1] kmem_cache_alloc+0x5d/0xa6 SS:ESP 
 0068:df9cfe94
 [ 2322.681327] ---[ end trace 35dbcab07ee48cc5 ]---
 [ 2322.737700] [ cut here ]
 [ 2322.748822] Kernel BUG at c0199e6d [verbose debug info unavailable]
 [ 2322.755960] invalid opcode:  [#2] SMP
 [ 2322.760773] Modules linked in: dccp_ccid2 dccp edd iptable_filter 
 ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class 
 fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy 
 ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr 
 container thermal piix generic i2c_piix4 processor button ac i2c_core 
 power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg 
 ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd 
 uhci_hcd scsi_mod usbcore
 [ 2322.813338]
 [ 2322.817134] Pid: 3125, comm: klogd Tainted: G  D (2.6.24-rc6 #4)
 [ 2322.821416] EIP: 0060:[c0199e6d] EFLAGS: