Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Linus Torvalds


On Thu, 15 Mar 2007, Hugh Dickins wrote:
> 
> sysfs_access_in_other_task() left me wondering what this "other" task
> was, and what kind of "access" it's trying to get - or is the calling
> task the other, and it's trying to access something it wouldn't
> directly have access to?

For naming clashes, I'd suggest:

 - try to name according to *why* something is done, not necessarily what 
   it does.

   For example, is it really in "another task"? Maybe it's just an 
   on-demand thread of the same task?  Do you actually care how the 
   deferred work is done?

 - avoid being vague. I agree with not liking the name much, and the 
   "other" thing bothers me. Like Hugh, it makes me ask "_What_ other 
   task?"

So I would suggest not concentrating on some implementation issue, but on 
the reason why you need it in the first place. Namely that you want to 
defer the actual action to avoid deadlock due to recursive locking. So 
that "why do I actually do this" thing implies something like 
"sysfs_store_async()" or "sysfs_store_deferred()" or maybe actually 
concentrate on the locking angle and say something like 
"sysfs_store_needs_to_reacquire_lock()".

(That last one wasn't really serious - it's too long and cumbersome, but 
it's an example of not caring _how_ you do it, just abotu what you want 
done).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Hugh Dickins
On Thu, 15 Mar 2007, Alan Stern wrote:
> 
> Personally I don't understand what was wrong with my name.  What's weird 
> or unintuitive about doing something in a different task's context?

The only thing wrong with sysfs_do_something_in_a_different_task_context()
is the length of the name.  "do", that's good, much better than "access".

sysfs_access_in_other_task() left me wondering what this "other" task
was, and what kind of "access" it's trying to get - or is the calling
task the other, and it's trying to access something it wouldn't
directly have access to?

> 
> Dmitry's suggestion is slightly inappropriate because the function doesn't
> take a workstruct as an argument and it isn't itself a workqueue callback.  

True, though since he's saying "work" rather than "workstruct",
I was okay with that: it's a sysfs wrapper to schedule_work().

> 
> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

A lot happier than with sysfs_access_in_other_task() -
if you prefer this to Dmitry's, it's okay by me.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Cornelia Huck
On Thu, 15 Mar 2007 10:27:19 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

> Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
> as well.

Fine with me.

> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

Count one happy person here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Alan Stern
On Thu, 15 Mar 2007, Cornelia Huck wrote:

> > > The naming seems a bit unintuitive, but I don't have a good
> > > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)

Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
as well.

> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> It's really none of my business, I'm merely the reporter the
> deadlock being fixed, and I don't know my way around sysfs at all ...
> 
> ... but I have to say I share your discomfort with Alan's
> "sysfs_access_in_other_task" naming, it sounded very weird to me.
> 
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?  

On Thu, 15 Mar 2007, Oliver Neukum wrote:

> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context  

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:

> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.  

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> I'm ashamed to have suggested anything else: certainly gets my vote.

Personally I don't understand what was wrong with my name.  What's weird 
or unintuitive about doing something in a different task's context?

Dmitry's suggestion is slightly inappropriate because the function doesn't
take a workstruct as an argument and it isn't itself a workqueue callback.  

Would people be happier with sysfs_schedule_callback() and
device_schedule_callback()?  At least the functions do take a callback 
pointer as an argument, even though they aren't callbacks themselves.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Hugh Dickins
On Thu, 15 Mar 2007, Dmitry Torokhov wrote:
> 
> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.

I'm ashamed to have suggested anything else: certainly gets my vote.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Dmitry Torokhov

On 3/15/07, Oliver Neukum <[EMAIL PROTECTED]> wrote:

Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
>
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay? sysfs_defer_work(), struct sysfs_deferred_work?

But we do not wish to defer or delay anything.
How about: sysfs_action_from_neutral_context



How about sysfs_schedule_work? That is what it does - schedules a work
on a sysfs object and everyone here knows what schedule_work() does.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Oliver Neukum
Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

But we do not wish to defer or delay anything.
How about: sysfs_action_from_neutral_context

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Hugh Dickins
On Thu, 15 Mar 2007, Cornelia Huck wrote:
> On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
> Alan Stern <[EMAIL PROTECTED]> wrote:
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)
> 
> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

It's really none of my business, I'm merely the reporter the
deadlock being fixed, and I don't know my way around sysfs at all ...

... but I have to say I share your discomfort with Alan's
"sysfs_access_in_other_task" naming, it sounded very weird to me.

Quite apart from this mysterious "other task", I don't understand
"access" either.

Perhaps "defer" would best capture the idea of another-task and
maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-15 Thread Cornelia Huck
On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

> > > +struct other_task_struct {
> > > + struct kobject  *kobj;
> > > + void(*func)(void *);
> > > + void*data;
> > > + struct work_struct  work;
> > > +};
> > > +
> > > +static void other_task_work(struct work_struct *work)
> > > +{
> > > + struct other_task_struct *ots = container_of(work,
> > > + struct other_task_struct, work);
> > > +
> > > + (ots->func)(ots->data);
> > > + kobject_put(ots->kobj);
> > > + kfree(ots);
> > > +}
> > 
> > The naming seems a bit unintuitive, but I don't have a good
> > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> 
> sysfs_work_struct is too generic; other parts of sysfs might also want to
> use workqueues for different purposes.

> I don't like calling it "delayed"-anything, because the operations aren't
> necessarily delayed!  On an SMP system they might even execute before the
> sysfs_access_in_other_task() call returns.  (Although the two examples we
> have so far can't do that because of lock contention.)

Sure. But then you shouldn't refer to "delay" in the comments for the
functions as well :)

> The major feature added here is that the work takes place in a different 
> task's context, not that it is delayed.  Hence the choice of names.

Hm. Perhaps device_schedule_access()?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-14 Thread Alan Stern
On Wed, 14 Mar 2007, Cornelia Huck wrote:

> On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
> Alan Stern <[EMAIL PROTECTED]> wrote:
> 
> > This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> > you?
> 
> Works for me (grouping & ungrouping ctc) and looks sane. Some more
> comments below.

Thank you.

> > +struct other_task_struct {
> > +   struct kobject  *kobj;
> > +   void(*func)(void *);
> > +   void*data;
> > +   struct work_struct  work;
> > +};
> > +
> > +static void other_task_work(struct work_struct *work)
> > +{
> > +   struct other_task_struct *ots = container_of(work,
> > +   struct other_task_struct, work);
> > +
> > +   (ots->func)(ots->data);
> > +   kobject_put(ots->kobj);
> > +   kfree(ots);
> > +}
> 
> The naming seems a bit unintuitive, but I don't have a good
> alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

sysfs_work_struct is too generic; other parts of sysfs might also want to
use workqueues for different purposes.

I don't like calling it "delayed"-anything, because the operations aren't
necessarily delayed!  On an SMP system they might even execute before the
sysfs_access_in_other_task() call returns.  (Although the two examples we
have so far can't do that because of lock contention.)

The major feature added here is that the work takes place in a different 
task's context, not that it is delayed.  Hence the choice of names.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-14 Thread Cornelia Huck
On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

> This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> you?

Works for me (grouping & ungrouping ctc) and looks sane. Some more
comments below.


> +struct other_task_struct {
> + struct kobject  *kobj;
> + void(*func)(void *);
> + void*data;
> + struct work_struct  work;
> +};
> +
> +static void other_task_work(struct work_struct *work)
> +{
> + struct other_task_struct *ots = container_of(work,
> + struct other_task_struct, work);
> +
> + (ots->func)(ots->data);
> + kobject_put(ots->kobj);
> + kfree(ots);
> +}

The naming seems a bit unintuitive, but I don't have a good
alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

> +
> +/**
> + * sysfs_access_in_other_task - delay access from an attribute method.
> + * @kobj: object we're acting for.
> + * @func: callback function to invoke later.
> + * @data: argument to pass to @func.
> + *
> + * sysfs attribute methods must not unregister themselves or their parent
> + * kobject (which would amount to the same thing).  Attempts to do so will
> + * deadlock, since unregistration is mutually exclusive with driver
> + * callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
> + void *data)

sysfs_delay_access()?


> +/**
> + * device_access_in_other_task - delay access from an attribute method.
> + * @dev: device.
> + * @func: callback function to invoke later.
> + *
> + * Attribute methods must not unregister themselves or their parent device
> + * (which would amount to the same thing).  Attempts to do so will deadlock,
> + * since unregistration is mutually exclusive with driver callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int device_access_in_other_task(struct device *dev,
> + void (*func)(struct device *))
> +{
> + return sysfs_access_in_other_task(&dev->kobj,
> + (void (*)(void *)) func, dev);
> +}
> +EXPORT_SYMBOL_GPL(device_access_in_other_task);

device_delay_access()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-14 Thread Alan Stern
On Tue, 13 Mar 2007, Linus Torvalds wrote:

> Could we please make this easier to use by having some common sysfs helper 
> routine for this kind of "delayed_store()" functionality.
> 
> I'm not a huge fan of delayed work at all, but if we have to have it, at 
> least make it one generic function rather than having multiple functions 
> all doing their own workqueue logic for it.

This seems more elegant (not yet tested).  Cornelia, does it look okay to 
you?

Alan Stern


Index: usb-2.6/include/linux/sysfs.h
===
--- usb-2.6.orig/include/linux/sysfs.h
+++ usb-2.6/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_access_in_other_task(struct kobject *kobj,
+   void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -133,6 +136,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_access_in_other_task(struct kobject *kobj,
+   void (*func)(void *), void *data)
+{
+   return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
return 0;
Index: usb-2.6/fs/sysfs/file.c
===
--- usb-2.6.orig/fs/sysfs/file.c
+++ usb-2.6/fs/sysfs/file.c
@@ -643,6 +643,59 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct other_task_struct {
+   struct kobject  *kobj;
+   void(*func)(void *);
+   void*data;
+   struct work_struct  work;
+};
+
+static void other_task_work(struct work_struct *work)
+{
+   struct other_task_struct *ots = container_of(work,
+   struct other_task_struct, work);
+
+   (ots->func)(ots->data);
+   kobject_put(ots->kobj);
+   kfree(ots);
+}
+
+/**
+ * sysfs_access_in_other_task - delay access from an attribute method.
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
+   void *data)
+{
+   struct other_task_struct *ots;
+
+   ots = kmalloc(sizeof(*ots), GFP_KERNEL);
+   if (!ots)
+   return -ENOMEM;
+   kobject_get(kobj);
+   ots->kobj = kobj;
+   ots->func = func;
+   ots->data = data;
+   INIT_WORK(&ots->work, other_task_work);
+   schedule_work(&ots->work);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_access_in_other_task);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: usb-2.6/include/linux/device.h
===
--- usb-2.6.orig/include/linux/device.h
+++ usb-2.6/include/linux/device.h
@@ -356,6 +356,8 @@ extern int __must_check device_create_bi
   struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
   struct bin_attribute *attr);
+extern int device_access_in_other_task(struct device *dev,
+   void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: usb-2.6/drivers/base/core.c
===
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -407,6 +407,30 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_access_in_other_task - delay access from an attribute method.
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int device_access_in_other_task(struct device *dev,
+   void (*func)(struct device *))
+{

Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Alan Stern
On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
> 
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?

Allow me to point out that the original patch is Oliver's (although I
helped), and it doesn't need to go in -- it needs not to be removed.

Furthermore, I have better things to do with the next month of my time 
than auditing hundreds of routines I don't understand for behavior I 
probably won't be able to recognize.  (Although at 50 a day... hmmm, 
maybe.)

This sounds more like a job for kernel-janitors!


On Tue, 13 Mar 2007, Dmitry Torokhov wrote:

> I think we could rely on subsystems maintainers to let us know if
> there are potential problems. For example I can tell that neither
> input, serio nor gameport subsystems use sysfs to destroy their  
> devices (action on sysfs may cause some other device to be destroyed
> but that should be ok, only self-destruction is not allowed, right?)

Very good points.  USB doesn't do anything like that either.  And right, 
it's okay for a method to destroy other devices; it just can't do anything 
that would lead to its own unregistration.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Linus Torvalds


On Tue, 13 Mar 2007, Cornelia Huck wrote:
> 
> Another call that deadlocked with Oliver's patch is ungroup for s390
> ccwgroup devices. It can be made to work again with a similar patch.

Could we please make this easier to use by having some common sysfs helper 
routine for this kind of "delayed_store()" functionality.

I'm not a huge fan of delayed work at all, but if we have to have it, at 
least make it one generic function rather than having multiple functions 
all doing their own workqueue logic for it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Dmitry Torokhov

On 3/13/07, Hugh Dickins <[EMAIL PROTECTED]> wrote:

On Tue, 13 Mar 2007, Alan Stern wrote:
>
> On the other hand, a quick survey of the kernel source shows that
> DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> for the faint-of-heart!

Indeed, and faint-hearted Hugh wasn't intending to do so: but
stout-hearted Alan will need to, won't he, before his patch can go in?



I think we could rely on subsystems maintainers to let us know if
there are potential problems. For example I can tell that neither
input, serio nor gameport subsystems use sysfs to destroy their
devices (action on sysfs may cause some other device to be destroyed
but that should be ok, only self-destruction is not allowed, right?)

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Hugh Dickins
On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> On the other hand, a quick survey of the kernel source shows that
> DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> for the faint-of-heart!

Indeed, and faint-hearted Hugh wasn't intending to do so: but
stout-hearted Alan will need to, won't he, before his patch can go in?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Alan Stern
On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > The consensus is that we would be better off keeping Oliver's original 
> > patch without your silly change, and instead fixing the particular method 
> > call that deadlocked.  Can you please try out the patch below with 
> > everything else as it was before?  It should solve your problem.
> 
> Yep, it works fine with your patch in and my silly reverted, thanks.
> But (I was about to say, even before seeing Cornelia's reply, honest!)
> I think you do need to check (audit the source? or is some runtime
> check possible?) for other such "suicidal" sysfs files, which
> seemed to (sysfs-ignorant) me to pose the real problem.

A runtime check wouldn't detect anything until someone tried to use the 
file -- at which point the process would deadlock anyway.

On the other hand, a quick survey of the kernel source shows that
DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
for the faint-of-heart!

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Hugh Dickins
On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.  Can you please try out the patch below with 
> everything else as it was before?  It should solve your problem.

Yep, it works fine with your patch in and my silly reverted, thanks.
But (I was about to say, even before seeing Cornelia's reply, honest!)
I think you do need to check (audit the source? or is some runtime
check possible?) for other such "suicidal" sysfs files, which
seemed to (sysfs-ignorant) me to pose the real problem.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Cornelia Huck
On Tue, 13 Mar 2007 11:00:21 -0400 (EDT),
Alan Stern <[EMAIL PROTECTED]> wrote:

> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.

Another call that deadlocked with Oliver's patch is ungroup for s390
ccwgroup devices. It can be made to work again with a similar patch.

Signed-off-by: Cornelia Huck <[EMAIL PROTECTED]>

---
 drivers/s390/cio/ccwgroup.c |   35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

--- linux-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ linux-2.6/drivers/s390/cio/ccwgroup.c
@@ -67,22 +67,49 @@ __ccwgroup_remove_symlinks(struct ccwgro

 }
 
+struct ccwgroup_work_struct {
+   struct ccwgroup_device *gdev;
+   struct work_struct work;
+};
+
+static void ccwgroup_ungroup_work(struct work_struct *work)
+{
+   struct ccwgroup_work_struct *ungroup_work
+   = container_of(work, struct ccwgroup_work_struct, work);
+
+   __ccwgroup_remove_symlinks(ungroup_work->gdev);
+   device_unregister(&ungroup_work->gdev->dev);
+   put_device(&ungroup_work->gdev->dev);
+   kfree(ungroup_work);
+}
+
 /*
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
+ * Note that we cannot unregister the device from one of its attribute
+ * methods, so we have to delay it.
  */
-static ssize_t
-ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, 
const char *buf, size_t count)
+static ssize_t ccwgroup_ungroup_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
 {
struct ccwgroup_device *gdev;
+   struct ccwgroup_work_struct *ungroup_work;
 
gdev = to_ccwgroupdev(dev);
 
if (gdev->state != CCWGROUP_OFFLINE)
return -EINVAL;
 
-   __ccwgroup_remove_symlinks(gdev);
-   device_unregister(dev);
+   ungroup_work = kmalloc(sizeof(*ungroup_work), GFP_KERNEL);
+   if (!ungroup_work)
+   return -ENOMEM;
+   ungroup_work->gdev = gdev;
+   INIT_WORK(&ungroup_work->work, ccwgroup_ungroup_work);
+   if (!get_device(&gdev->dev))
+   kfree(ungroup_work);
+   else
+   schedule_work(&ungroup_work->work);
 
return count;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-13 Thread Alan Stern
On Tue. 6 Mar 2007, Hugh Dickins wrote:

> But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
> on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:
> 
> 60x60 D B0415080 0 10778  10771 (NOTLB)
>e8227e04 0086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 
> 0001 
>e80c60b0 72af360e 0085 1941 e80c61bc e8227e00 b01606bf 
> ef47d3c0 
>ed07c1dc ed07c1e4 0246 e8227e30 b02f6ef0 e80c60b0 0001 
> e80c60b0 
> Call Trace:
>  [] __down+0xaa/0xb8
>  [] __down_failed+0xa/0x10
>  [] sysfs_drop_dentry+0xa2/0xda
>  [] __sysfs_remove_dir+0x6d/0xf8
>  [] sysfs_remove_dir+0x15/0x20
>  [] kobject_del+0x16/0x22
>  [] device_del+0x1c9/0x1e2
>  [] __scsi_remove_device+0x43/0x7a
>  [] scsi_remove_device+0x1f/0x2b
>  [] sdev_store_delete+0x16/0x1b
>  [] dev_attr_store+0x32/0x34
>  [] flush_write_buffer+0x37/0x3d
>  [] sysfs_write_file+0x5e/0x82
>  [] vfs_write+0xa7/0x150
>  [] sys_write+0x47/0x6b
>  [] sysenter_past_esp+0x5f/0x85
>   /usr/lib/dockutils/hooks/thinkpad/60x60 undock
>   /usr/lib/dockutils/dockhandler undock
>   /usr/sbin/docker undock
>   /etc/pm/hooks/23dock suspend
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.
> 
> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.
> 
> Thanks,
> Hugh
> 
> --- 2.6.21-rc2-git5/fs/sysfs/inode.c  2007-02-28 08:30:26.0 
> +
> +++ linux/fs/sysfs/inode.c2007-03-06 18:03:13.0 +
> @@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
>  
>   mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
>   if (node->i_private) {
> - list_for_each_entry(buf, &set->associates, associates) {
> - down(&buf->sem);
> + list_for_each_entry(buf, &set->associates, associates)
>   buf->orphaned = 1;
> - up(&buf->sem);
> - }
>   }
>   mutex_unlock(&node->i_mutex);
>  }

Hugh, there has been a long discussion among several people concerning 
this issue.  See for example this thread:

http://marc.info/?t=11733593521&r=1&w=2

and also:

http://marc.info/?l=linux-kernel&m=117355959020831&w=2

The consensus is that we would be better off keeping Oliver's original 
patch without your silly change, and instead fixing the particular method 
call that deadlocked.  Can you please try out the patch below with 
everything else as it was before?  It should solve your problem.

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,39 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+/* An attribute method cannot unregister itself, so this workaround for
+ * sdev_store_delete() is necessary.
+ */
+struct sdev_work_struct {
+   struct scsi_device *sdev;
+   struct work_struct work;
+};
+
+static void sdev_store_delete_work(struct work_struct *work)
+{
+   struct sdev_work_struct *sdw = container_of(work,
+   struct sdev_work_struct, work);
+
+   scsi_remove_device(sdw->sdev);
+   scsi_device_put(sdw->sdev);
+   kfree(sdw);
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute 
*attr, const char *buf,
 size_t count)
 {
-   scsi_remove_device(to_scsi_device(dev));
+   struct scsi_device *sdev = to_scsi_device(dev);
+   struct sdev_work_struct *sdw;
+
+   sdw = kmalloc(sizeof(*sdw), GFP_KERNEL);
+   if (!sdw)
+   return -ENOMEM;
+   sdw->sdev = sdev;
+   INIT_WORK(&sdw->work, sdev_store_delete_work);
+   if (scsi_device_get(sdev) != 0)
+   kfree(sdw);
+   else
+   schedule_work(&sdw->work);
return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-10 Thread Alan Stern
[For the start of this thread, see 
.]

On Wed, 7 Mar 2007, Linus Torvalds wrote:

> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Linus's analysis is correct as far as it goes, but it misses some very 
important points.  The _real_ problem here, which nobody has pointed out 
so far, is not device removal or driver unloading.  It is driver 
unbinding -- with its consequent issue of access rights.

When a driver is unbound from a device, when should the driver stop trying 
to access that device?  The obvious answer is that it must stop before its 
release() method returns.  Otherwise the device might get bound to 
another driver and we would have both drivers trying to talk to it at the 
same time.

In other words, when a driver unbinds from a device, it loses its right to
access that device.  Same goes for any device-related data structures that
weren't created by the driver itself.  When you realize this, it becomes
obvious that the driver faces a synchronization problem.  All its entry
points must be synchronized with release(), to avoid races.

So there actually are two things a driver has to worry about:

The lifetime of its private data structures (which can be solved
using refcounts as Linus advocated);

The race between release() and other activities (which cannot
be solved by refcounts but needs a true synchronization technique,
such as a mutex).

No doubt some of this sounds familiar; the race between open() and
disconnect() for char device drivers is one we have faced many times and
not always solved perfectly.  Also note that this is a fundamental
problem, affecting many facilities in addition to sysfs.


One way to solve these problems is to put all the responsibility on the 
driver.  Make it refcount its data structures and use mutexes.  This is 
not very attractive for several reasons:

_Lots_ of drivers are affected.  Pretty much any driver which
registers a char device or a sysfs attribute file.

_Lots_ of code would need to be changed, adding considerable
bloat.  Every show()/store() method would need to acquire a mutex,
and many would need to be passed an additional argument, requiring
a change in the sysfs API.  (I can explain why in a follow-up 
email if anyone is interested.)

Most importantly, doing all the refcounting and mutual exclusion
correctly is quite hard.  It's amazingly easy to make mistakes
in these areas.  The chances of getting it right while changing
multiple functions in every single driver are infinitesimal.

Another approach is to put all the responsibility on the core subsystems
that handle driver registration.  They should enforce rigidly two
principles: "No driver callbacks occur after unregistration" and its
prerequisite, "Unregistration is mutually exclusive with driver
callbacks".  (This is exactly what Oliver's original patch did for sysfs.)

The number of core subsystems affected is much smaller than the
total number of drivers.  Sysfs, debugfs, the char device
subsystem, maybe a few others.

Drivers would no longer have to worry about doing their own
synchronization or refcounts.  It would be guaranteed that a
private data structure would never be accessed from sysfs after
device_remove_file() returned, so the structure could safely and
easily be deallocated as part of release().

At the expense of complicating a few central subsystems, we could simplify
a lot of drivers.  I think this is a worthwhile tradeoff.

It does have a small disadvantage; it means that an entry point would
deadlock if it tried to unregister itself.  (The example which started
this whole thread was sdev_store_delete() in the SCSI core.  Writing to
that attribute unregisters the device to which it belongs.)  Clearly the
actual unregistration would have to performed separately in a workqueue.  
I think the number of places where this occurs is pretty small.


It's true that this approach goes against the general philosophy used
elsewhere in the kernel.  Refcounting without synchronization is the
general rule.

However unbinding is a special case.  Normally with refcounting, it
doesn't matter when a driver tries to read or write a data structure.  So
long as the driver still holds a reference, the data will be there and the
access will be okay.

But not with unbinding!  After unbinding, the data will still be there but 
it m

Fwd: Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-08 Thread Oliver Neukum
To let you hear the verdict.

Regards
Oliver

--  Weitergeleitete Nachricht  --

Subject: Re: 2.6.21-rc suspend regression: sysfs deadlock
Date: Mittwoch, 7. März 2007 19:02
From: Linus Torvalds <[EMAIL PROTECTED]>
To: Oliver Neukum <[EMAIL PROTECTED]>
Cc: Dmitry Torokhov <[EMAIL PROTECTED]>, Hugh Dickins <[EMAIL PROTECTED]>, 
Oliver Neukum <[EMAIL PROTECTED]>, Maneesh Soni <[EMAIL PROTECTED]>, Greg 
Kroah-Hartman <[EMAIL PROTECTED]>, Adrian Bunk <[EMAIL PROTECTED]>, 
linux-kernel@vger.kernel.org


On Wed, 7 Mar 2007, Oliver Neukum wrote:
>
> The problem also exists with unplugging devices. Drivers get no feedback
> to tell them when it is safe to free the data structures associated with
> an attribute.

So you just pointed to *another* data structure that apparently violates 
the "you MUST use refcounting" rule.

What is it with you people? It's really simple. Data structures must be 
refcounted if you can reach them two different ways.

If you don't use refcounting, then you'd better make sure that the data 
can be reached only one way (for example, by *not* exposing it for sysfs).

It really *is* that simple. Read the CodingStyle rules.

Linus


---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Oliver Neukum
Am Mittwoch, 7. März 2007 19:02 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Oliver Neukum wrote:
> >
> > The problem also exists with unplugging devices. Drivers get no feedback
> > to tell them when it is safe to free the data structures associated with
> > an attribute.
> 
> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Very well, there seems to be no clean way to avoid that work.

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Oliver Neukum wrote:
>
> The problem also exists with unplugging devices. Drivers get no feedback
> to tell them when it is safe to free the data structures associated with
> an attribute.

So you just pointed to *another* data structure that apparently violates 
the "you MUST use refcounting" rule.

What is it with you people? It's really simple. Data structures must be 
refcounted if you can reach them two different ways.

If you don't use refcounting, then you'd better make sure that the data 
can be reached only one way (for example, by *not* exposing it for sysfs).

It really *is* that simple. Read the CodingStyle rules.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Oliver Neukum
Am Mittwoch, 7. März 2007 17:52 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> > 
> > ... with the exception that it will again make data associated with
> > sysfs attributes accessible past the point of returning from
> > sysfs_remove_file. And that was the point so drivers would not have to
> > care about handling access to extra data (such as static strings) past
> > the driver unload.
> 
> Drivers are unloaded by stopping the whole machine (exactly because module 
> unload is otherwise so hard to handle), so that never happens unless you 
> actively block. In other words, if you do something as simple as

The problem also exists with unplugging devices. Drivers get no feedback
to tell them when it is safe to free the data structures associated with
an attribute.

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Linus Torvalds


On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> 
> ... with the exception that it will again make data associated with
> sysfs attributes accessible past the point of returning from
> sysfs_remove_file. And that was the point so drivers would not have to
> care about handling access to extra data (such as static strings) past
> the driver unload.

Drivers are unloaded by stopping the whole machine (exactly because module 
unload is otherwise so hard to handle), so that never happens unless you 
actively block. In other words, if you do something as simple as

if (inode->i_private_data)
sysfs_flush_buffer(buffer);

then there is no race with unloading (unless the driver itself does 
something stupid, of course - but the whole point of having a kernel 
buffer is so that it does *not* have to make user accesses etc).

But the one thing you should *not* do is to depend on a sleeping lock, 
because that breaks the whole model!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Dmitry Torokhov

On 3/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:


 - removing the buffer is now just

   mutex_lock(&inode->i_mutex);
   buffer = inode->i_private;
   inode->i_private = NULL;
   mutex_unlock(&inode->i_mutex);

   put_sysfs_buffer(buffer);

 - everybody is happy!



... with the exception that it will again make data associated with
sysfs attributes accessible past the point of returning from
sysfs_remove_file. And that was the point so drivers would not have to
care about handling access to extra data (such as static strings) past
the driver unload.

I wonder if we should keep Oliver's change and require attribute
implementations to offload "delete me" kind of actions to workqueues.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-07 Thread Oliver Neukum
Am Mittwoch, 7. März 2007 02:56 schrieb Linus Torvalds:
> Anyway, I'm unable to revert the broken commit, since there are now other 
> changes that depend on it, but can somebody *please* do that? I'll apply 
> Hugh's silly patch in the meantime, just to avoid the lockup.

As you like it. This patch reverts it.

Regards
Oliver

Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]>
-

--- orig/fs/sysfs/inode.c   2007-03-07 10:49:42.0 +0100
+++ linux-2.6.21-rc3/fs/sysfs/inode.c   2007-03-07 10:52:56.0 +0100
@@ -220,20 +220,6 @@
return NULL;
 }
 
-static inline void orphan_all_buffers(struct inode *node)
-{
-   struct sysfs_buffer_collection *set = node->i_private;
-   struct sysfs_buffer *buf;
-
-   mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
-   if (node->i_private) {
-   list_for_each_entry(buf, &set->associates, associates)
-   buf->orphaned = 1;
-   }
-   mutex_unlock(&node->i_mutex);
-}
-
-
 /*
  * Unhashes the dentry corresponding to given sysfs_dirent
  * Called with parent inode's i_mutex held.
@@ -241,23 +227,16 @@
 void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
 {
struct dentry * dentry = sd->s_dentry;
-   struct inode *inode;
 
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
if (!(d_unhashed(dentry) && dentry->d_inode)) {
-   inode = dentry->d_inode;
-   spin_lock(&inode->i_lock);
-   __iget(inode);
-   spin_unlock(&inode->i_lock);
dget_locked(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
simple_unlink(parent->d_inode, dentry);
-   orphan_all_buffers(inode);
-   iput(inode);
} else {
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
--- orig/fs/sysfs/file.c2007-03-07 10:37:28.0 +0100
+++ linux-2.6.21-rc3/fs/sysfs/file.c2007-03-07 10:54:00.0 +0100
@@ -7,7 +7,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -52,30 +51,6 @@
 };
 
 /**
- * add_to_collection - add buffer to a collection
- * @buffer:buffer to be added
- * @node:  inode of set to add to
- */
-
-static inline void
-add_to_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-   struct sysfs_buffer_collection *set = node->i_private;
-
-   mutex_lock(&node->i_mutex);
-   list_add(&buffer->associates, &set->associates);
-   mutex_unlock(&node->i_mutex);
-}
-
-static inline void
-remove_from_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-   mutex_lock(&node->i_mutex);
-   list_del(&buffer->associates);
-   mutex_unlock(&node->i_mutex);
-}
-
-/**
  * fill_read_buffer - allocate and fill buffer from object.
  * @dentry:dentry pointer.
  * @buffer:data buffer for file.
@@ -168,10 +143,6 @@
ssize_t retval = 0;
 
down(&buffer->sem);
-   if (buffer->orphaned) {
-   retval = -ENODEV;
-   goto out;
-   }
if (buffer->needs_read_fill) {
if ((retval = fill_read_buffer(file->f_path.dentry,buffer)))
goto out;
@@ -261,16 +232,11 @@
ssize_t len;
 
down(&buffer->sem);
-   if (buffer->orphaned) {
-   len = -ENODEV;
-   goto out;
-   }
len = fill_write_buffer(buffer, buf, count);
if (len > 0)
len = flush_write_buffer(file->f_path.dentry, buffer, len);
if (len > 0)
*ppos += len;
-out:
up(&buffer->sem);
return len;
 }
@@ -279,7 +245,6 @@
 {
struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
struct attribute * attr = to_attr(file->f_path.dentry);
-   struct sysfs_buffer_collection *set;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
int error = 0;
@@ -309,18 +274,6 @@
if (!ops)
goto Eaccess;
 
-   /* make sure we have a collection to add our buffers to */
-   mutex_lock(&inode->i_mutex);
-   if (!(set = inode->i_private)) {
-   if (!(set = inode->i_private = kmalloc(sizeof(struct 
sysfs_buffer_collection), GFP_KERNEL))) {
-   error = -ENOMEM;
-   goto Done;
-   } else {
-   INIT_LIST_HEAD(&set->associates);
-   }
-   }
-   mutex_unlock(&inode->i_mutex);
-
/* File needs write support.
 * The inode's perms must say it's ok, 
 * and we must have a store method.
@@ -346,11 +299,9 @@
  

Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-06 Thread Linus Torvalds


On Tue, 6 Mar 2007, Hugh Dickins wrote:
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

Gaah. What a crock.

I really don't see any alternative to just reverting the whole change. 
Hugh's patch is simple, but rather pointless.

The fact is, the whole change is *bogus*.

We don't "lock" datastructures. We *reference count* them!

This is so fundamental that it's even mentioned in the file 
Documentation/CodingStyle in "Chapter 11: Data structures".

The whole "orphaned" kind of locking is broken. It's stupid. The way we do 
races between removal and use is that initial setup sets a reference count 
of 1, and something really simple like:

static inline struct sysfs_buffer *get_sysfs_buffer(struct inode *inode)
{
struct sysfs_buffer *buffer = inode->i_private;

BUG_ON(!mutex_locked(&inode->i_mutex));
if (buffer)
atomic_inc(&buffer->count);
return buffer;
}

static inline void put_sysfs_buffer(struct sysfs_buffer *buffer)
{
if (atomic_dec_and_test(&buffer->count))
kfree(buffer);
}

and then the rule is:

 - everybody uses "get_sysfs_buffer()" to follow the reference (and yes, 
   you obviously have to hold "inode->i_mutex" for this to be safe! I 
   added the BUG_ON() as an example)

 - everybody uses "put_buffer()" to release it (and we simply don't *care* 
   whether somebody else released it too, since everybody has a reference 
   count)

 - removing the buffer is now just

mutex_lock(&inode->i_mutex);
buffer = inode->i_private;
inode->i_private = NULL;
mutex_unlock(&inode->i_mutex);

put_sysfs_buffer(buffer);

 - everybody is happy!

Anyway, I'm unable to revert the broken commit, since there are now other 
changes that depend on it, but can somebody *please* do that? I'll apply 
Hugh's silly patch in the meantime, just to avoid the lockup.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc suspend regression: sysfs deadlock

2007-03-06 Thread Oliver Neukum
Am Dienstag, 6. März 2007 20:20 schrieb Hugh Dickins:
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

I had not thought about sysfs removing files in sysfs.

> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.

The latter.
 
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.

If you want a quick fix a work queue could be used, but it's a kludge.
Suggestions, anybody?

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc suspend regression: sysfs deadlock

2007-03-06 Thread Hugh Dickins
Resume from RAM on a ThinkPad T43p is now happy with Thomas' periodic
tick fix - the most unusable aspect of that for me had been how slow
repeat keys were to start repeating, but that's all fine now.

But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:

60x60 D B0415080 0 10778  10771 (NOTLB)
   e8227e04 0086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 0001 
   e80c60b0 72af360e 0085 1941 e80c61bc e8227e00 b01606bf ef47d3c0 
   ed07c1dc ed07c1e4 0246 e8227e30 b02f6ef0 e80c60b0 0001 e80c60b0 
Call Trace:
 [] __down+0xaa/0xb8
 [] __down_failed+0xa/0x10
 [] sysfs_drop_dentry+0xa2/0xda
 [] __sysfs_remove_dir+0x6d/0xf8
 [] sysfs_remove_dir+0x15/0x20
 [] kobject_del+0x16/0x22
 [] device_del+0x1c9/0x1e2
 [] __scsi_remove_device+0x43/0x7a
 [] scsi_remove_device+0x1f/0x2b
 [] sdev_store_delete+0x16/0x1b
 [] dev_attr_store+0x32/0x34
 [] flush_write_buffer+0x37/0x3d
 [] sysfs_write_file+0x5e/0x82
 [] vfs_write+0xa7/0x150
 [] sys_write+0x47/0x6b
 [] sysenter_past_esp+0x5f/0x85
  /usr/lib/dockutils/hooks/thinkpad/60x60 undock
  /usr/lib/dockutils/dockhandler undock
  /usr/sbin/docker undock
  /etc/pm/hooks/23dock suspend

This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
while calling flush_write_buffer, and flushing that particular write
buffer entails downing buffer->sem in orphan_all_buffers.

Suspend no longer deadlocks with the following silly patch, but I expect
this either pokes a small hole in your scheme, or renders it pointless.
Maybe that commit needs to be reverted, or maybe you can see how to fix
it up for -rc3.

Thanks,
Hugh

--- 2.6.21-rc2-git5/fs/sysfs/inode.c2007-02-28 08:30:26.0 +
+++ linux/fs/sysfs/inode.c  2007-03-06 18:03:13.0 +
@@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
 
mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
if (node->i_private) {
-   list_for_each_entry(buf, &set->associates, associates) {
-   down(&buf->sem);
+   list_for_each_entry(buf, &set->associates, associates)
buf->orphaned = 1;
-   up(&buf->sem);
-   }
}
mutex_unlock(&node->i_mutex);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/