Re: kref refcounting breakage in mainline

2007-03-19 Thread Randy Dunlap
On Thu, 15 Mar 2007 07:54:14 -0700 Greg KH wrote:

> On Thu, Mar 15, 2007 at 11:19:20AM +0100, Mike Galbraith wrote:
> > On Thu, 2007-03-15 at 01:06 -0700, Greg KH wrote:
> > 
> > > That's good.  But why don't we have a module name for this driver?
> > > 
> > > And if we don't have a module name, why would there be a symlink to
> > > remove?  That's what is keeping your module from unloading, right?
> > 
> > You keep saying "module", and that's making me a bit nervous ;-)
> > 
> > Just to be sure we're not talking past each other, when you say module,
> > don't mean the modprobe kind... i hope.  This "module" as in driver is
> > compiled in.  (said that before, but you may have missed it)
> 
> Ahh, that changes everything here, thanks for letting me know, I had
> missed this.
> 
> The problem is that the module_init() is failing, yet this isn't really
> a module, it's built into the kernel.  So some of the module teardown
> logic is dieing when it thinks that we really have a full module
> structure here (owner and such).

Urgh, it's not a "loadable" module, but it's still a logical module.

> I'll look at this further tomorrow, as I'm travelling pretty much all
> day today, sorry.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-15 Thread Greg KH
On Thu, Mar 15, 2007 at 11:19:20AM +0100, Mike Galbraith wrote:
> On Thu, 2007-03-15 at 01:06 -0700, Greg KH wrote:
> 
> > That's good.  But why don't we have a module name for this driver?
> > 
> > And if we don't have a module name, why would there be a symlink to
> > remove?  That's what is keeping your module from unloading, right?
> 
> You keep saying "module", and that's making me a bit nervous ;-)
> 
> Just to be sure we're not talking past each other, when you say module,
> don't mean the modprobe kind... i hope.  This "module" as in driver is
> compiled in.  (said that before, but you may have missed it)

Ahh, that changes everything here, thanks for letting me know, I had
missed this.

The problem is that the module_init() is failing, yet this isn't really
a module, it's built into the kernel.  So some of the module teardown
logic is dieing when it thinks that we really have a full module
structure here (owner and such).

I'll look at this further tomorrow, as I'm travelling pretty much all
day today, sorry.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-15 Thread Mike Galbraith
On Thu, 2007-03-15 at 09:32 +0100, Mike Galbraith wrote:
> On Thu, 2007-03-15 at 01:06 -0700, Greg KH wrote:
> 
> > That's good.  But why don't we have a module name for this driver?
> > 
> > And if we don't have a module name, why would there be a symlink to
> > remove?  That's what is keeping your module from unloading, right?
> 
> Ya got me, but according to my debug logs, what's causing my lockup is
> the reference we add while making the symlink when we hit...
>  if (driver_name) in module_add_driver().  Maybe we go through there
> twice, once with a name, and once without?  Dunno.

I found the log (i think), and even with the patched kernel, gdb still
says...
(gdb) list *module_add_driver+0x44
0xc01442e5 is in module_add_driver (kernel/module.c:2401).
2396driver_name = make_driver_name(drv);
2397if (driver_name) {
2398module_create_drivers_dir(mk);
2399no_warn = sysfs_create_link(mk->drivers_dir, &drv->kobj,
2400driver_name);
2401kfree(driver_name);
2402}
2403}
2404EXPORT_SYMBOL(module_add_driver);
2405
(gdb)

See: kobject filter function caused the event to drop!

(erm, that spot caused some hmm action here. if that drop is bad, i have
another one as well.)

[   30.397160] kobject ipmi_devintf: registering. parent: , set: module
[   30.404033] kobject_uevent_env
[   30.407098] fill_kobj_path: path = '/module/ipmi_devintf'
[   30.412524] BUG: at lib/kobject.c:448 kobject_get()
[   30.417402]  [] show_trace_log_lvl+0x1a/0x30
[   30.422564]  [] show_trace+0x12/0x14
[   30.427031]  [] dump_stack+0x16/0x18
[   30.431501]  [] kobject_get+0x66/0x87
[   30.436056]  [] kobject_shadow_add+0x10/0x1e8
[   30.441312]  [] kobject_add+0xa/0xc
[   30.445695]  [] kernel_param_sysfs_setup+0x4d/0x7f
[   30.451376]  [] param_sysfs_init+0x188/0x1c3
[   30.456538]  [] init+0x144/0x26c
[   30.460661]  [] kernel_thread_helper+0x7/0x1c
[   30.465907]  ===
[   30.469486] get: c18f65c0 count after get is 2
[   30.473927] kobject ipmi_si: registering. parent: , set: module
[   30.480372] kobject_uevent_env
[   30.483430] fill_kobj_path: path = '/module/ipmi_si'

..

[   73.266556] bus platform: add driver ipmi
[   73.278013] kobject ipmi: registering. parent: , set: drivers
[   73.291847] kobject_uevent_env
[   73.302358] fill_kobj_path: path = '/bus/platform/drivers/ipmi'
[   73.315943] ipmi message handler version 39.1
[   73.327839] ipmi device interface
[   73.338524] device class 'ipmi': registering
[   73.350158] subsystem ipmi: registering
[   73.361309] kobject ipmi: registering. parent: , set: class
[   73.374841] bus platform: add driver ipmi_si
[   73.386442] BUG: at lib/kobject.c:448 kobject_get()
[   73.398617]  [] show_trace_log_lvl+0x1a/0x30
[   73.411079]  [] show_trace+0x12/0x14
[   73.422780]  [] dump_stack+0x16/0x18
[   73.434324]  [] kobject_get+0x66/0x87
[   73.445860]  [] kobject_shadow_add+0x10/0x1e8
[   73.458088]  [] kobject_add+0xa/0xc
[   73.469286]  [] kobject_register+0x22/0xb3
[   73.480986]  [] bus_add_driver+0x77/0x1ae
[   73.492592]  [] driver_register+0x54/0x84
[   73.504101]  [] init_ipmi_si+0x4d/0x829
[   73.515335]  [] init+0x144/0x26c
[   73.525822]  [] kernel_thread_helper+0x7/0x1c
[   73.537452]  ===
[   73.547290] get: c064475c count after get is 2
[   73.557969] kobject ipmi_si: registering. parent: , set: drivers
[   73.570825] kobject_uevent_env
[   73.580011] fill_kobj_path: path = '/bus/platform/drivers/ipmi_si'
[   73.592622] BUG: at lib/kobject.c:242 kobject_register()
[   73.604312]  [] show_trace_log_lvl+0x1a/0x30
[   73.615822]  [] show_trace+0x12/0x14
[   73.626551]  [] dump_stack+0x16/0x18
[   73.637211]  [] kobject_register+0x79/0xb3
[   73.648325]  [] bus_add_driver+0x77/0x1ae
[   73.659297]  [] driver_register+0x54/0x84
[   73.670164]  [] init_ipmi_si+0x4d/0x829
[   73.680756]  [] init+0x144/0x26c
[   73.690699]  [] kernel_thread_helper+0x7/0x1c
[   73.701854]  ===
[   73.711295] register: c064475c count now is 2 error 0
[   73.722218] BUG: at lib/kobject.c:448 kobject_get()
[   73.732911]  [] show_trace_log_lvl+0x1a/0x30
[   73.743927]  [] show_trace+0x12/0x14
[   73.754112]  [] dump_stack+0x16/0x18
[   73.764253]  [] kobject_get+0x66/0x87
[   73.774532]  [] get_driver+0x11/0x18
[   73.784725]  [] driver_create_file+0xf/0x32
[   73.795559]  [] bus_add_driver+0x15a/0x1ae
[   73.806343]  [] driver_register+0x54/0x84
[   73.817055]  [] init_ipmi_si+0x4d/0x829
[   73.827595]  [] init+0x144/0x26c
[   73.837468]  [] kernel_thread_helper+0x7/0x1c
[   73.848544]  ===
[   73.857932] get: c064475c count after get is 3
[   73.868223] BUG: at lib/kobject.c:494 kobject_put()
[   73.878943]  [] show_trace_log_lvl+0x1a/0x30
[   73.890019]  [] show_trace+0x12/0x14
[   73.900239]  [] dump_stack+0x16/0x18
[   73.910294]  [] kobject_put+0x69/0x82
[   73.9203

Re: kref refcounting breakage in mainline

2007-03-15 Thread Mike Galbraith
On Thu, 2007-03-15 at 01:06 -0700, Greg KH wrote:

> That's good.  But why don't we have a module name for this driver?
> 
> And if we don't have a module name, why would there be a symlink to
> remove?  That's what is keeping your module from unloading, right?

Ya got me, but according to my debug logs, what's causing my lockup is
the reference we add while making the symlink when we hit...
 if (driver_name) in module_add_driver().  Maybe we go through there
twice, once with a name, and once without?  Dunno.
 
> > [   24.670410] ipmi message handler version 39.1
> > [   24.675000] ipmi device interface
> > [   24.678542] IPMI System Interface driver.
> > [   24.703956] BUG: at kernel/module.c:2429 module_remove_driver()
> > [   24.716837]  [] show_trace_log_lvl+0x1a/0x30
> > [   24.728909]  [] show_trace+0x12/0x14
> > [   24.740239]  [] dump_stack+0x16/0x18
> > [   24.751469]  [] module_remove_driver+0xa5/0xa7
> > [   24.763584]  [] bus_remove_driver+0x6d/0x82
> > [   24.775390]  [] driver_unregister+0xb/0x18
> > [   24.787019]  [] init_ipmi_si+0x7a9/0x7c1
> > [   24.798450]  [] init+0x144/0x26c
> > [   24.809129]  [] kernel_thread_helper+0x7/0x1c
> > [   24.820916]  ===
> > [   24.830926] ipmi_si: Unable to find any System Interface(s)
> > [   24.842952] IPMI Watchdog: driver initialized
> >   24.853749] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via 
> > sys_reboot.
> 
> With the above change, it all works correctly?

I don't know about _correctly_, but my diag patch _boots_, as does your
patch plus my addon diag bits.

> If the ipmi driver is loaded, what does the /sys/module/MODULE_NAME/
> tree look like (replacing MODULE_NAME with whatever the module name
> really is, sorry, I don't know)?

Well, I will never see that, because ipmi_si finds no interfaces, so
always backs out.  After backout, with my patch and yours + my addons, I
have there leftovers.

[EMAIL PROTECTED]: ls -R /sys/module/ipmi_si
/sys/module/ipmi_si:
drivers  parameters

/sys/module/ipmi_si/drivers:

/sys/module/ipmi_si/parameters:
bt_debug  hotmod  kcs_debug  smic_debug

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-15 Thread Greg KH
On Thu, Mar 15, 2007 at 08:53:07AM +0100, Mike Galbraith wrote:
> On Wed, 2007-03-14 at 22:27 -0700, Greg KH wrote:
> 
> > But can you try this version instead?
> 
> It exploded in strcmp().  Unfortunately, the full oops didn't make it to
> either console or serial console.
> 
> [   30.783048] ipmi message handler version 39.1
> [   30.787632] ipmi device interface
> [   30.791166] IPMI System Interface driver.
> [   30.816961] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 
> [   30.832700]  printing eip:
> [   30.842383] c02d4098
> [   30.851458] *pde = 
> [   30.861089] Oops:  [#1]
> [   30.870585] PREEMPT SMP 
> [   30.879724] Modules linked in:
> [   30.889288] CPU:1
> [   30.889290] EIP:0060:[]Not tainted VLI
> [   30.889292] EFLAGS: 00010282   (2.6.21-rc3-smp #81)

Gah, that just happened to me too, sorry for not booting with that
change before sending it to you.

> I did this...
> 
> --- kernel/module.c.org   2007-03-15 07:20:15.0 +0100
> +++ kernel/module.c   2007-03-15 08:13:36.0 +0100
> @@ -2419,15 +2419,19 @@ void module_remove_driver(struct device_
>  
>   if (drv->owner && drv->owner->mkobj.drivers_dir)
>   mk = &drv->owner->mkobj;
> - else {
> + else if (drv->mod_name) {
>   /* Lookup built-in module entry in /sys/modules */
>   mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
>   if (!mkobj)
> - return;
> + goto out_free;
>   mk = container_of(mkobj, struct module_kobject, kobj);
> + } else {
> + WARN_ON(1);
> + goto out_free;
>   }
>   sysfs_remove_link(mk->drivers_dir, driver_name);
>   kobject_put(mkobj);
> +out_free:
>   kfree(driver_name);
>  }
>  EXPORT_SYMBOL(module_remove_driver);
> 
> ...and it booted.

That's good.  But why don't we have a module name for this driver?

And if we don't have a module name, why would there be a symlink to
remove?  That's what is keeping your module from unloading, right?

> [   24.670410] ipmi message handler version 39.1
> [   24.675000] ipmi device interface
> [   24.678542] IPMI System Interface driver.
> [   24.703956] BUG: at kernel/module.c:2429 module_remove_driver()
> [   24.716837]  [] show_trace_log_lvl+0x1a/0x30
> [   24.728909]  [] show_trace+0x12/0x14
> [   24.740239]  [] dump_stack+0x16/0x18
> [   24.751469]  [] module_remove_driver+0xa5/0xa7
> [   24.763584]  [] bus_remove_driver+0x6d/0x82
> [   24.775390]  [] driver_unregister+0xb/0x18
> [   24.787019]  [] init_ipmi_si+0x7a9/0x7c1
> [   24.798450]  [] init+0x144/0x26c
> [   24.809129]  [] kernel_thread_helper+0x7/0x1c
> [   24.820916]  ===
> [   24.830926] ipmi_si: Unable to find any System Interface(s)
> [   24.842952] IPMI Watchdog: driver initialized
>   24.853749] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via 
> sys_reboot.

With the above change, it all works correctly?

If the ipmi driver is loaded, what does the /sys/module/MODULE_NAME/
tree look like (replacing MODULE_NAME with whatever the module name
really is, sorry, I don't know)?

thanks,

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-14 Thread Mike Galbraith
On Wed, 2007-03-14 at 22:27 -0700, Greg KH wrote:

> But can you try this version instead?

It exploded in strcmp().  Unfortunately, the full oops didn't make it to
either console or serial console.

[   30.783048] ipmi message handler version 39.1
[   30.787632] ipmi device interface
[   30.791166] IPMI System Interface driver.
[   30.816961] BUG: unable to handle kernel NULL pointer dereference at virtual 
address 
[   30.832700]  printing eip:
[   30.842383] c02d4098
[   30.851458] *pde = 
[   30.861089] Oops:  [#1]
[   30.870585] PREEMPT SMP 
[   30.879724] Modules linked in:
[   30.889288] CPU:1
[   30.889290] EIP:0060:[]Not tainted VLI
[   30.889292] EFLAGS: 00010282   (2.6.21-rc3-smp #81)


I did this...

--- kernel/module.c.org 2007-03-15 07:20:15.0 +0100
+++ kernel/module.c 2007-03-15 08:13:36.0 +0100
@@ -2419,15 +2419,19 @@ void module_remove_driver(struct device_
 
if (drv->owner && drv->owner->mkobj.drivers_dir)
mk = &drv->owner->mkobj;
-   else {
+   else if (drv->mod_name) {
/* Lookup built-in module entry in /sys/modules */
mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
if (!mkobj)
-   return;
+   goto out_free;
mk = container_of(mkobj, struct module_kobject, kobj);
+   } else {
+   WARN_ON(1);
+   goto out_free;
}
sysfs_remove_link(mk->drivers_dir, driver_name);
kobject_put(mkobj);
+out_free:
kfree(driver_name);
 }
 EXPORT_SYMBOL(module_remove_driver);

...and it booted.

[   24.670410] ipmi message handler version 39.1
[   24.675000] ipmi device interface
[   24.678542] IPMI System Interface driver.
[   24.703956] BUG: at kernel/module.c:2429 module_remove_driver()
[   24.716837]  [] show_trace_log_lvl+0x1a/0x30
[   24.728909]  [] show_trace+0x12/0x14
[   24.740239]  [] dump_stack+0x16/0x18
[   24.751469]  [] module_remove_driver+0xa5/0xa7
[   24.763584]  [] bus_remove_driver+0x6d/0x82
[   24.775390]  [] driver_unregister+0xb/0x18
[   24.787019]  [] init_ipmi_si+0x7a9/0x7c1
[   24.798450]  [] init+0x144/0x26c
[   24.809129]  [] kernel_thread_helper+0x7/0x1c
[   24.820916]  ===
[   24.830926] ipmi_si: Unable to find any System Interface(s)
[   24.842952] IPMI Watchdog: driver initialized
  24.853749] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via 
sys_reboot.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-14 Thread Greg KH
On Sat, Mar 10, 2007 at 04:44:06PM +0100, Mike Galbraith wrote:
> On Wed, 2007-03-07 at 06:39 +0100, Mike Galbraith wrote:
> > On Tue, 2007-03-06 at 13:04 -0800, Greg KH wrote:
> > > On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> > > > On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> > > > 
> > > > > Mike, I've reverted this patch, and I don't see any references 
> > > > > leaking.
> > > > > And, as your patch released the reference on the driver, and the
> > > > > module_add_driver() call would not grab a reference to the driver, 
> > > > > only
> > > > > the module kobject, I don't see what you were trying to fix with this
> > > > > patch.
> > > > > 
> > > > > Do you have a test case that this fixes?
> > > > 
> > > > What it fixed for me was the hard hang reported below.
> > > > 
> > > > http://lkml.org/lkml/2007/2/16/96
> > > 
> > > What specific module are you trying to unload that causes the hang?  I
> > > think it might just be a problem with that module, and not with all
> > > others.
> > 
> > It's ipmi_si that's hanging, waits for completion that never comes.
> > 
> > > So, I'm going to revert your patch and work to try to find the real
> > > cause of this problem.
> > 
> > Yeah, my stab at it seems busted.  I'll take another poke at it to see
> > if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757)
> > I'm left with a reference.
> 
> Ok, stab #2.
> 
> My reference count woes stem from module_remove_driver() not removing
> the link created in module_add_driver().  With the below, my box boots
> fine.  Since I obviously know spit about driver layer glue, I'll just
> call this one a diagnostic, and head for the hills :)

Does ipmi_si not have a "owner"?  Ah, that makes sense, not all modules
do...

> --- linux-2.6.20-rc3/kernel/module.c.org  2007-03-10 15:16:47.0 
> +0100
> +++ linux-2.6.20-rc3/kernel/module.c  2007-03-10 15:43:09.0 +0100
> @@ -2411,14 +2411,28 @@ void module_remove_driver(struct device_
>   return;
>  
>   sysfs_remove_link(&drv->kobj, "module");
> - if (drv->owner && drv->owner->mkobj.drivers_dir) {
> - driver_name = make_driver_name(drv);
> - if (driver_name) {
> - sysfs_remove_link(drv->owner->mkobj.drivers_dir,
> + driver_name = make_driver_name(drv);
> + if (!driver_name)
> + return;
> + if (drv->owner && drv->owner->mkobj.drivers_dir)
> + sysfs_remove_link(drv->owner->mkobj.drivers_dir,
> driver_name);
> - kfree(driver_name);
> - }
> + else if (drv->mod_name) {
> + struct module_kobject *mk;
> + struct kobject *mkobj;
> +
> + /* Lookup built-in module entry in /sys/modules */
> + mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
> + if (!mkobj)
> + goto out_free;
> + mk = container_of(mkobj, struct module_kobject, kobj);
> + module_create_drivers_dir(mk);
> + sysfs_remove_link(mk->drivers_dir, driver_name);
> + /* Release reference taken via lookup */
> + kobject_put(mkobj);
>   }
> +out_free:
> + kfree(driver_name);
>  }
>  EXPORT_SYMBOL(module_remove_driver);
>  #endif

That's pretty good for not knowing much about the subject matter here.
But can you try this version instead?  It should work a bit better than
yours.

thanks for your patience,

greg k-h

Subject: modules: fix reference counting logic for drivers without module 
pointers.

We weren't dropping the sysfs link for the module driver name if we
didn't happen to have the "owner" pointer in the driver.

Based on a patch from Mike Galbraith <[EMAIL PROTECTED]>

Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/module.c |   24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2405,20 +2405,30 @@ EXPORT_SYMBOL(module_add_driver);
 
 void module_remove_driver(struct device_driver *drv)
 {
+   struct module_kobject *mk = NULL;
+   struct kobject *mkobj = NULL;
char *driver_name;
 
if (!drv)
return;
 
sysfs_remove_link(&drv->kobj, "module");
-   if (drv->owner && drv->owner->mkobj.drivers_dir) {
-   driver_name = make_driver_name(drv);
-   if (driver_name) {
-   sysfs_remove_link(drv->owner->mkobj.drivers_dir,
- driver_name);
-   kfree(driver_name);
-   }
+   driver_name = make_driver_name(drv);
+   if (!driver_name)
+   return;
+
+   if (drv->owner && drv->owner->mkobj.drivers_dir)
+   mk = &drv->owner->mkobj;
+   else {
+   /* Lookup built-in module entry in /sys/modules */
+   mkobj = kset_find_obj(&module_subsys.kset

Re: kref refcounting breakage in mainline

2007-03-10 Thread Mike Galbraith
P.S.  forgot to include diagnostic log.  Kobject c0644890 is the source
of my woes.  Printk's come below WARN_ON(is_ipmi_si_kobj).  Post-tinker
log is huge, and probably not interesting.

[   30.397160] kobject ipmi_devintf: registering. parent: , set: module
[   30.404033] kobject_uevent_env
[   30.407098] fill_kobj_path: path = '/module/ipmi_devintf'
[   30.412524] BUG: at lib/kobject.c:448 kobject_get()
[   30.417402]  [] show_trace_log_lvl+0x1a/0x30
[   30.422564]  [] show_trace+0x12/0x14
[   30.427031]  [] dump_stack+0x16/0x18
[   30.431501]  [] kobject_get+0x66/0x87
[   30.436056]  [] kobject_shadow_add+0x10/0x1e8
[   30.441312]  [] kobject_add+0xa/0xc
[   30.445695]  [] kernel_param_sysfs_setup+0x4d/0x7f
[   30.451376]  [] param_sysfs_init+0x188/0x1c3
[   30.456538]  [] init+0x144/0x26c
[   30.460661]  [] kernel_thread_helper+0x7/0x1c
[   30.465907]  ===
[   30.469486] get: c18f65c0 count after get is 2
[   30.473927] kobject ipmi_si: registering. parent: , set: module
[   30.480372] kobject_uevent_env
[   30.483430] fill_kobj_path: path = '/module/ipmi_si'


..


[   73.266556] bus platform: add driver ipmi
[   73.278013] kobject ipmi: registering. parent: , set: drivers
[   73.291847] kobject_uevent_env
[   73.302358] fill_kobj_path: path = '/bus/platform/drivers/ipmi'
[   73.315943] ipmi message handler version 39.1
[   73.327839] ipmi device interface
[   73.338524] device class 'ipmi': registering
[   73.350158] subsystem ipmi: registering
[   73.361309] kobject ipmi: registering. parent: , set: class
[   73.374841] bus platform: add driver ipmi_si
[   73.386442] BUG: at lib/kobject.c:448 kobject_get()
[   73.398617]  [] show_trace_log_lvl+0x1a/0x30
[   73.411079]  [] show_trace+0x12/0x14
[   73.422780]  [] dump_stack+0x16/0x18
[   73.434324]  [] kobject_get+0x66/0x87
[   73.445860]  [] kobject_shadow_add+0x10/0x1e8
[   73.458088]  [] kobject_add+0xa/0xc
[   73.469286]  [] kobject_register+0x22/0xb3
[   73.480986]  [] bus_add_driver+0x77/0x1ae
[   73.492592]  [] driver_register+0x54/0x84
[   73.504101]  [] init_ipmi_si+0x4d/0x829
[   73.515335]  [] init+0x144/0x26c
[   73.525822]  [] kernel_thread_helper+0x7/0x1c
[   73.537452]  ===
[   73.547290] get: c064475c count after get is 2
[   73.557969] kobject ipmi_si: registering. parent: , set: drivers
[   73.570825] kobject_uevent_env
[   73.580011] fill_kobj_path: path = '/bus/platform/drivers/ipmi_si'
[   73.592622] BUG: at lib/kobject.c:242 kobject_register()
[   73.604312]  [] show_trace_log_lvl+0x1a/0x30
[   73.615822]  [] show_trace+0x12/0x14
[   73.626551]  [] dump_stack+0x16/0x18
[   73.637211]  [] kobject_register+0x79/0xb3
[   73.648325]  [] bus_add_driver+0x77/0x1ae
[   73.659297]  [] driver_register+0x54/0x84
[   73.670164]  [] init_ipmi_si+0x4d/0x829
[   73.680756]  [] init+0x144/0x26c
[   73.690699]  [] kernel_thread_helper+0x7/0x1c
[   73.701854]  ===
[   73.711295] register: c064475c count now is 2 error 0
[   73.722218] BUG: at lib/kobject.c:448 kobject_get()
[   73.732911]  [] show_trace_log_lvl+0x1a/0x30
[   73.743927]  [] show_trace+0x12/0x14
[   73.754112]  [] dump_stack+0x16/0x18
[   73.764253]  [] kobject_get+0x66/0x87
[   73.774532]  [] get_driver+0x11/0x18
[   73.784725]  [] driver_create_file+0xf/0x32
[   73.795559]  [] bus_add_driver+0x15a/0x1ae
[   73.806343]  [] driver_register+0x54/0x84
[   73.817055]  [] init_ipmi_si+0x4d/0x829
[   73.827595]  [] init+0x144/0x26c
[   73.837468]  [] kernel_thread_helper+0x7/0x1c
[   73.848544]  ===
[   73.857932] get: c064475c count after get is 3
[   73.868223] BUG: at lib/kobject.c:494 kobject_put()
[   73.878943]  [] show_trace_log_lvl+0x1a/0x30
[   73.890019]  [] show_trace+0x12/0x14
[   73.900239]  [] dump_stack+0x16/0x18
[   73.910294]  [] kobject_put+0x69/0x82
[   73.920304]  [] put_driver+0xb/0xd
[   73.929909]  [] driver_create_file+0x2b/0x32
[   73.940362]  [] bus_add_driver+0x15a/0x1ae
[   73.950686]  [] driver_register+0x54/0x84
[   73.960966]  [] init_ipmi_si+0x4d/0x829
[   73.971055]  [] init+0x144/0x26c
[   73.980478]  [] kernel_thread_helper+0x7/0x1c
[   73.991119]  ===
[   74.50] put: c064475c count before put is 3
[   74.010026] BUG: at lib/kobject.c:448 kobject_get()
[   74.020367]  [] show_trace_log_lvl+0x1a/0x30
[   74.031011]  [] show_trace+0x12/0x14
[   74.040867]  [] dump_stack+0x16/0x18
[   74.050602]  [] kobject_get+0x66/0x87
[   74.060353]  [] get_driver+0x11/0x18
[   74.070034]  [] driver_create_file+0xf/0x32
[   74.080418]  [] bus_add_driver+0x16c/0x1ae
[   74.090742]  [] driver_register+0x54/0x84
[   74.100979]  [] init_ipmi_si+0x4d/0x829
[   74.111041]  [] init+0x144/0x26c
[   74.120431]  [] kernel_thread_helper+0x7/0x1c
[   74.131047]  ===
[   74.139949] get: c064475c count after get is 3
[   74.149807] BUG: at lib/kobject.c:494 kobject_put()
[   74.160111]  [] show_trace_log_lvl+0x1a/0x30
[   74.170739]  [] show_trace+0x12/0x1

Re: kref refcounting breakage in mainline

2007-03-10 Thread Mike Galbraith
On Wed, 2007-03-07 at 06:39 +0100, Mike Galbraith wrote:
> On Tue, 2007-03-06 at 13:04 -0800, Greg KH wrote:
> > On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> > > On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> > > 
> > > > Mike, I've reverted this patch, and I don't see any references leaking.
> > > > And, as your patch released the reference on the driver, and the
> > > > module_add_driver() call would not grab a reference to the driver, only
> > > > the module kobject, I don't see what you were trying to fix with this
> > > > patch.
> > > > 
> > > > Do you have a test case that this fixes?
> > > 
> > > What it fixed for me was the hard hang reported below.
> > > 
> > > http://lkml.org/lkml/2007/2/16/96
> > 
> > What specific module are you trying to unload that causes the hang?  I
> > think it might just be a problem with that module, and not with all
> > others.
> 
> It's ipmi_si that's hanging, waits for completion that never comes.
> 
> > So, I'm going to revert your patch and work to try to find the real
> > cause of this problem.
> 
> Yeah, my stab at it seems busted.  I'll take another poke at it to see
> if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757)
> I'm left with a reference.

Ok, stab #2.

My reference count woes stem from module_remove_driver() not removing
the link created in module_add_driver().  With the below, my box boots
fine.  Since I obviously know spit about driver layer glue, I'll just
call this one a diagnostic, and head for the hills :)

--- linux-2.6.20-rc3/kernel/module.c.org2007-03-10 15:16:47.0 
+0100
+++ linux-2.6.20-rc3/kernel/module.c2007-03-10 15:43:09.0 +0100
@@ -2411,14 +2411,28 @@ void module_remove_driver(struct device_
return;
 
sysfs_remove_link(&drv->kobj, "module");
-   if (drv->owner && drv->owner->mkobj.drivers_dir) {
-   driver_name = make_driver_name(drv);
-   if (driver_name) {
-   sysfs_remove_link(drv->owner->mkobj.drivers_dir,
+   driver_name = make_driver_name(drv);
+   if (!driver_name)
+   return;
+   if (drv->owner && drv->owner->mkobj.drivers_dir)
+   sysfs_remove_link(drv->owner->mkobj.drivers_dir,
  driver_name);
-   kfree(driver_name);
-   }
+   else if (drv->mod_name) {
+   struct module_kobject *mk;
+   struct kobject *mkobj;
+
+   /* Lookup built-in module entry in /sys/modules */
+   mkobj = kset_find_obj(&module_subsys.kset, drv->mod_name);
+   if (!mkobj)
+   goto out_free;
+   mk = container_of(mkobj, struct module_kobject, kobj);
+   module_create_drivers_dir(mk);
+   sysfs_remove_link(mk->drivers_dir, driver_name);
+   /* Release reference taken via lookup */
+   kobject_put(mkobj);
}
+out_free:
+   kfree(driver_name);
 }
 EXPORT_SYMBOL(module_remove_driver);
 #endif


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-06 Thread Mike Galbraith
On Tue, 2007-03-06 at 13:04 -0800, Greg KH wrote:
> On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> > On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> > 
> > > Mike, I've reverted this patch, and I don't see any references leaking.
> > > And, as your patch released the reference on the driver, and the
> > > module_add_driver() call would not grab a reference to the driver, only
> > > the module kobject, I don't see what you were trying to fix with this
> > > patch.
> > > 
> > > Do you have a test case that this fixes?
> > 
> > What it fixed for me was the hard hang reported below.
> > 
> > http://lkml.org/lkml/2007/2/16/96
> 
> What specific module are you trying to unload that causes the hang?  I
> think it might just be a problem with that module, and not with all
> others.

It's ipmi_si that's hanging, waits for completion that never comes.

> So, I'm going to revert your patch and work to try to find the real
> cause of this problem.

Yeah, my stab at it seems busted.  I'll take another poke at it to see
if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757)
I'm left with a reference.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-06 Thread Greg KH
On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> 
> > Mike, I've reverted this patch, and I don't see any references leaking.
> > And, as your patch released the reference on the driver, and the
> > module_add_driver() call would not grab a reference to the driver, only
> > the module kobject, I don't see what you were trying to fix with this
> > patch.
> > 
> > Do you have a test case that this fixes?
> 
> What it fixed for me was the hard hang reported below.
> 
> http://lkml.org/lkml/2007/2/16/96

What specific module are you trying to unload that causes the hang?  I
think it might just be a problem with that module, and not with all
others.

So, I'm going to revert your patch and work to try to find the real
cause of this problem.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-06 Thread Mel Gorman
On (05/03/07 16:25), Greg KH didst pronounce:
> On Fri, Mar 02, 2007 at 12:58:33AM -0800, Andrew Morton wrote:
> > 
> > -mm has a debugging patch which warns when atomic_dec_and_test() takes an
> > atomic_t negative
> > (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20/2.6.20-mm2/broken-out/detect-atomic-counter-underflows.patch).
> > 
> > 
> > When it is applied to current mainline, a simple `rmmod ipw2200' gives:
> > 
> > [   75.825072] BUG: atomic counter underflow at:
> > [   75.825180]  [] kref_put+0x66/0x82
> > [   75.825278]  [] bus_remove_driver+0x66/0x75
> > [   75.825383]  [] driver_unregister+0x8/0x13
> > [   75.825484]  [] pci_unregister_driver+0xc/0x45
> > [   75.825593]  [] sys_delete_module+0x157/0x17c
> > [   75.825703]  [] audit_syscall_entry+0x10d/0x137
> > [   75.825818]  [] syscall_call+0x7/0xb
> > [   75.825913]  [] xfrm4_dst_destroy+0xe/0xd5
> > 
> > This didn't happen in 2.6.20-mm2, so this bug was introduced by a patch
> > which was not in the -mm lineup twelve days ago.
> > 
> > Presumably the effect of this is a memory leak or a use-after-free.
> 
> Ok, after a zillion bisects, I've tracked this down to:
>   commit 63ce18cfe685115ff8d341bae4c9204a79043cf0
>   Author: Mike Galbraith <[EMAIL PROTECTED]>
>   Date:   Wed Feb 21 12:45:35 2007 -0800
> 
>   driver core: refcounting fix
>   
>   Fix a reference counting bug exposed by commit
>   725522b5453dd680412f2b6463a988e4fd148757.  If driver.mod_name 
> exists, we
>   take a reference in module_add_driver(), and never release it.  
> Undo that
>   reference in module_remove_driver().
>   
>   Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>
>   Cc: Kay Sievers <[EMAIL PROTECTED]>
>   Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
>   Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>   
> 

On http://test.kernel.org, elm3b132 is showing a similar error message for
the IPS driver on 2.6.21-rc2-mm1. There is no such device in the machine so
this is a normal error path for no devices found. The warning looks like

BUG: atomic counter underflow at:
 [] kref_put+0x7d/0x9d
 [] bus_remove_driver+0x36/0x41
 [] driver_unregister+0xb/0x13
 [] pci_unregister_driver+0xb/0x13
 [] ips_module_init+0x41/0x57
 [] do_initcalls+0x58/0xf5
 [] register_irq_proc+0x75/0x92
 [] init+0x0/0x8b
 [] init+0x49/0x8b
 [] kernel_thread_helper+0x7/0x10

This is essentially identical to the warning on ipw2200. It occurs whether
the driver is compiled into the kernel or as a module.  Reverting Mike's
patch fixes the problem - or at least the warning has disappeared.

However, I've cc'd the IPS maintainers so they can confirm they are
calling pci_unregister_driver() correctly. I note that many drivers in
drivers/scsi do not call pci_unregister_driver() in the module_init path.
ipw2200 also calls pci_unregister_driver() in the module_init path when
an error is encountered.

So, maybe this is an error in how the drivers use pci_[un]register_driver()
instead of a problem with Mike's patch?

> Mike, I've reverted this patch, and I don't see any references leaking.
> And, as your patch released the reference on the driver, and the
> module_add_driver() call would not grab a reference to the driver, only
> the module kobject, I don't see what you were trying to fix with this
> patch.
> 
> Do you have a test case that this fixes?
> 
> Otherwise, I'll just revert it.
> 
> thanks,
> 
> greg k-h
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-05 Thread Mike Galbraith
On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:

> Mike, I've reverted this patch, and I don't see any references leaking.
> And, as your patch released the reference on the driver, and the
> module_add_driver() call would not grab a reference to the driver, only
> the module kobject, I don't see what you were trying to fix with this
> patch.
> 
> Do you have a test case that this fixes?

What it fixed for me was the hard hang reported below.

http://lkml.org/lkml/2007/2/16/96


-Mike


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-05 Thread Greg KH
On Fri, Mar 02, 2007 at 12:58:33AM -0800, Andrew Morton wrote:
> 
> -mm has a debugging patch which warns when atomic_dec_and_test() takes an
> atomic_t negative
> (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20/2.6.20-mm2/broken-out/detect-atomic-counter-underflows.patch).
> 
> 
> When it is applied to current mainline, a simple `rmmod ipw2200' gives:
> 
> [   75.825072] BUG: atomic counter underflow at:
> [   75.825180]  [] kref_put+0x66/0x82
> [   75.825278]  [] bus_remove_driver+0x66/0x75
> [   75.825383]  [] driver_unregister+0x8/0x13
> [   75.825484]  [] pci_unregister_driver+0xc/0x45
> [   75.825593]  [] sys_delete_module+0x157/0x17c
> [   75.825703]  [] audit_syscall_entry+0x10d/0x137
> [   75.825818]  [] syscall_call+0x7/0xb
> [   75.825913]  [] xfrm4_dst_destroy+0xe/0xd5
> 
> This didn't happen in 2.6.20-mm2, so this bug was introduced by a patch
> which was not in the -mm lineup twelve days ago.
> 
> Presumably the effect of this is a memory leak or a use-after-free.

Ok, after a zillion bisects, I've tracked this down to:
commit 63ce18cfe685115ff8d341bae4c9204a79043cf0
Author: Mike Galbraith <[EMAIL PROTECTED]>
Date:   Wed Feb 21 12:45:35 2007 -0800

driver core: refcounting fix

Fix a reference counting bug exposed by commit
725522b5453dd680412f2b6463a988e4fd148757.  If driver.mod_name 
exists, we
take a reference in module_add_driver(), and never release it.  
Undo that
reference in module_remove_driver().

Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>
Cc: Kay Sievers <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>


Mike, I've reverted this patch, and I don't see any references leaking.
And, as your patch released the reference on the driver, and the
module_add_driver() call would not grab a reference to the driver, only
the module kobject, I don't see what you were trying to fix with this
patch.

Do you have a test case that this fixes?

Otherwise, I'll just revert it.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-02 Thread Greg KH
On Fri, Mar 02, 2007 at 12:58:33AM -0800, Andrew Morton wrote:
> 
> -mm has a debugging patch which warns when atomic_dec_and_test() takes an
> atomic_t negative
> (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20/2.6.20-mm2/broken-out/detect-atomic-counter-underflows.patch).
> 
> 
> When it is applied to current mainline, a simple `rmmod ipw2200' gives:
> 
> [   75.825072] BUG: atomic counter underflow at:
> [   75.825180]  [] kref_put+0x66/0x82
> [   75.825278]  [] bus_remove_driver+0x66/0x75
> [   75.825383]  [] driver_unregister+0x8/0x13
> [   75.825484]  [] pci_unregister_driver+0xc/0x45
> [   75.825593]  [] sys_delete_module+0x157/0x17c
> [   75.825703]  [] audit_syscall_entry+0x10d/0x137
> [   75.825818]  [] syscall_call+0x7/0xb
> [   75.825913]  [] xfrm4_dst_destroy+0xe/0xd5
> 
> This didn't happen in 2.6.20-mm2, so this bug was introduced by a patch
> which was not in the -mm lineup twelve days ago.
> 
> Presumably the effect of this is a memory leak or a use-after-free.

Ugh.

I'll add it to my local tree here and try to bisect to find the problem.

thanks for the pointer,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kref refcounting breakage in mainline

2007-03-02 Thread Andrew Morton

-mm has a debugging patch which warns when atomic_dec_and_test() takes an
atomic_t negative
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20/2.6.20-mm2/broken-out/detect-atomic-counter-underflows.patch).


When it is applied to current mainline, a simple `rmmod ipw2200' gives:

[   75.825072] BUG: atomic counter underflow at:
[   75.825180]  [] kref_put+0x66/0x82
[   75.825278]  [] bus_remove_driver+0x66/0x75
[   75.825383]  [] driver_unregister+0x8/0x13
[   75.825484]  [] pci_unregister_driver+0xc/0x45
[   75.825593]  [] sys_delete_module+0x157/0x17c
[   75.825703]  [] audit_syscall_entry+0x10d/0x137
[   75.825818]  [] syscall_call+0x7/0xb
[   75.825913]  [] xfrm4_dst_destroy+0xe/0xd5

This didn't happen in 2.6.20-mm2, so this bug was introduced by a patch
which was not in the -mm lineup twelve days ago.

Presumably the effect of this is a memory leak or a use-after-free.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/