from:"Ming Lei"

Re: [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack

2012-11-27 Thread Ming Lei

On Wed, Nov 28, 2012 at 5:24 AM, Rafael J. Wysocki  wrote:
>
> Please don't duplicate code this way.
>
> You can move that whole thing to rpm_callback().  Yes, you'll probably need to
> check dev->power.memalloc_noio twice in there, but that's OK.

Good idea, I will update it in v7.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
>> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
>> to help PM core to teach mm not allocating memory with GFP_KERNEL
>> flag for avoiding probable deadlock.
>>
>> As explained in the comment, any GFP_KERNEL allocation inside
>> runtime_resume() or runtime_suspend() on any one of device in
>> the path from one block or network device to the root device
>> in the device tree may cause deadlock, the introduced
>> pm_runtime_set_memalloc_noio() sets or clears the flag on
>> device in the path recursively.
>>
>> Cc: Alan Stern 
>> Cc: "Rafael J. Wysocki" 
>> Signed-off-by: Ming Lei 
>> ---
>> v5:
>>   - fix code style error
>>   - add comment on clear the device memalloc_noio flag
>> v4:
>>   - rename memalloc_noio_resume as memalloc_noio
>>   - remove pm_runtime_get_memalloc_noio()
>>   - add comments on pm_runtime_set_memalloc_noio
>> v3:
>>   - introduce pm_runtime_get_memalloc_noio()
>>   - hold one global lock on pm_runtime_set_memalloc_noio
>>   - hold device power lock when accessing memalloc_noio_resume
>> flag suggested by Alan Stern
>>   - implement pm_runtime_set_memalloc_noio without recursion
>> suggested by Alan Stern
>> v2:
>>   - introduce pm_runtime_set_memalloc_noio()
>> ---
>>  drivers/base/power/runtime.c |   60 
>> ++
>>  include/linux/pm.h   |1 +
>>  include/linux/pm_runtime.h   |3 +++
>>  3 files changed, 64 insertions(+)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 3148b10..3e198a0 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
>> device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>>
>> +static int dev_memalloc_noio(struct device *dev, void *data)
>> +{
>> + return dev->power.memalloc_noio;
>> +}
>> +
>> +/*
>> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + * @enable: True for setting the flag and False for clearing the flag.
>> + *
>> + * Set the flag for all devices in the path from the device to the
>> + * root device in the device tree if @enable is true, otherwise clear
>> + * the flag for devices in the path whose siblings don't set the flag.
>> + *
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Thanks for your review.

IMO, pm_runtime_set_memalloc_noio() is only called in
probe() and release() of block device and network device, which is
in a very infrequent path, so I am wondering if it is worthy of introducing
another counter for all devices.

Also looks the current implementation of pm_runtime_set_memalloc_noio()
is simple and clean enough with the flag, IMO.

> I would use the flag only to store the information that
> pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> and I'd use a counter for everything else.
>
> That is, have power.memalloc_count that would be incremented when (1)
> pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> power.memalloc_count for one of its children changes from 0 to 1 (and
> analogously for decrementation).  Then, check the counter in rpm_callback().

Sorry, could you explain in a bit detail why we need the counter? Looks only
checking the flag in rpm_callback() is enough, doesn't it?

>
> Besides, don't you need to check children for the arg device itself?

It isn't needed since the children of network/block device can't be
involved of the deadlock in runtime PM path.

Also, the function is only called by network device or block device
subsystem, both the two kind of device are class device and should
have no children.

>
>> + * The function should only be called by block device, or network
>> + * device driver for solving the deadlock problem during runtime
>> + * resume/suspend:
>> + *
>> + * If memory allocation with GFP_KERNEL is called inside runtime
>> + * resume/suspend callback of any one of its ancestors(or the
>> + * block device itself), the deadlock may be triggered inside the
>> + * memory allocation since it mi

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Even though counter is added, walking the whole path can't be avoided too,
and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
is required to set or clear the flag(or increase/decrease the counter) of
devices in the whole path.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei

On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki  wrote:
>
> But it doesn't have to walk the children.  Moreover, with counters it only

Yeah, I got it, it is the advantage of counter, but with extra 'int'
field introduced
in 'struct device'.

> needs to walk the whole path if all devices in it need to be updated.  For
> example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> whose parent's counter is greater than zero already, you don't need to
> walk the path above the parent.

We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
can return immediately if one parent or the 'dev' flag is true.

But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
called in a very infrequent path(network/block device->remove()), looks the
introduced cost isn't worthy of the obtained advantage.

So could you accept not introducing counter? and I will update with the
above improvement you suggested.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei

On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki  wrote:
>
> Well, it may be unfrequent, but does it mean it has to do things that may
> be avoided (ie. walking the children of every node in the path in some cases)?

I agree so without introducing extra cost, :-)

> I don't really think that the counters would cost us that much anyway.

On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing
'unsigned intnoio_cnt;' to 'struct dev_pm_info', and total memory
increases about 3752bytes in a small configuration(about 494 device instance).
The actual memory increase should be more than the data because 'struct device'
is generally embedded into other concrete device structure.

>> Also looks the current implementation of pm_runtime_set_memalloc_noio()
>> is simple and clean enough with the flag, IMO.
>
> I know you always know better. :-)

We still need to consider cost and the function calling frequency, :-)

>
>> > I would use the flag only to store the information that
>> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device 
>> > directly
>> > and I'd use a counter for everything else.
>> >
>> > That is, have power.memalloc_count that would be incremented when (1)
>> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) 
>> > when
>> > power.memalloc_count for one of its children changes from 0 to 1 (and
>> > analogously for decrementation).  Then, check the counter in 
>> > rpm_callback().
>>
>> Sorry, could you explain in a bit detail why we need the counter? Looks only
>> checking the flag in rpm_callback() is enough, doesn't it?
>
> Why would I want to use power.memalloc_count in addition to the
> power.memalloc_noio flag?
>
> Consider this:
>
> pm_runtime_set_memalloc_noio(dev):
> return if power.memalloc_noio is set
> set power.memalloc_noio
>   loop:
> increment power.memalloc_count
> if power.memalloc_count is 1 now switch to parent and go to loop

I am wondering if the above should be changed to below because the child
count of memalloc_noio device need to be recorded.

pm_runtime_set_memalloc_noio(dev):
 return if power.memalloc_noio is set
 set power.memalloc_noio
loop:
 increment power.memalloc_count
 switch to parent and go to loop

So pm_runtime_set_memalloc_noio(dev) will become worse than
the improved pm_runtime_set_memalloc_noio(dev, true), which
can return immediately if one dev or parent's flag is true.

> pm_runtime_clear_memalloc_noio(dev):
> return if power.memalloc_noio is unset
> unset power.memalloc_noio
>   loop:
> decrement power.memalloc_count
> if power.memalloc_count is 0 now switch to parent and go to loop

The above will perform well than pm_runtime_set_memalloc_noio(dev, false),
because the above avoids to walk children of device.

So one becomes worse and another becomes better, :-)

Also the children count of one device is generally very small, less than
10 for most devices, see the data obtained in one common x86 pc(thinkpad
t410) from below link:

http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log

- about 8 devices whose child count is more than 10, top three are 18, 17 ,12,
and all the three are root devices.

- about 117 devices whose child count is between 1 and 9

- other 501 devices whose child count is zero

>From above data, walking device children should have not much effect on
performance of pm_runtime_set_memalloc_noio(), which is also called in
very infrequent path.

> Looks kind of simpler, doesn't it?

Looks simpler, but more code lines than single
pm_runtime_set_memalloc_noio(), :-)

>
> And why rpm_callback() should check power.memalloc_count instead of the count?
> Because power.memalloc_noio will only be set for devices that
> pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
> the parents).
>
> And that works even if someone calls any of them twice in a row for the same
> device (presumably by mistake) and doesn't have to make any assumptions
> about devices it is called for.

IMO, we can ignore the mistake usage because the function is called only
in network/block core code currently, not by individual driver.

>
>> > Besides, don't you need to check children for the arg device itself?
>>
>> It isn't needed since the children of network/block device can't be
>> involved of the deadlock in runtime PM path.
>>
>> Also, the function is only called by network device or block device
>> subsystem, both the two kind of device are class device and should
>> have no children.
>
> OK, so n

Re: use after free in sysfs_find_dirent

2013-03-18 Thread Ming Lei

Hi Sasha,

On Tue, Mar 19, 2013 at 10:06 AM, Sasha Levin  wrote:
> [  232.822703] sysfs_dir_pos-973 sysfs_dirent use after free: 
> vx855(vx855)-bind, 0-25520352

Looks filp->f_pos is changed as zero by llseek(), so may leave
filp->private_data
point to one refcount-balanced sysfs_dirent object, which will be put
again afterwards.

Hope we are luck this time, please try the attachment patch.

Thanks,
-- 
Ming Lei

sysfs-fix-readdir-v2.patch
Description: Binary data

Re: use after free in sysfs_find_dirent

2013-03-19 Thread Ming Lei

Hi Sasha,

On Tue, Mar 19, 2013 at 11:40 AM, Ming Lei  wrote:
> Hi Sasha,
>
> On Tue, Mar 19, 2013 at 10:06 AM, Sasha Levin  wrote:
>> [  232.822703] sysfs_dir_pos-973 sysfs_dirent use after free: 
>> vx855(vx855)-bind, 0-25520352
>
> Looks filp->f_pos is changed as zero by llseek(), so may leave
> filp->private_data
> point to one refcount-balanced sysfs_dirent object, which will be put
> again afterwards.
>
> Hope we are luck this time, please try the attachment patch.

Looks the better and simpler way is to hold the i_mutex for llseek.
If you haven't test the v2, please ignore it and just test the attachment
v3 patch.


Thanks,
-- 
Ming Lei


sysfs-fix-readdir-v3.patch
Description: Binary data

Re: use after free in sysfs_find_dirent

2013-03-19 Thread Ming Lei

Hi Sasha,

On Wed, Mar 20, 2013 at 12:28 AM, Sasha Levin  wrote:
> On 03/19/2013 07:54 AM, Ming Lei wrote:
>
> With v3 of the patch:
>
> [ 1275.665758] sysfs_dir_pos-973 sysfs_dirent use after free: 
> tun(tun)-uevent, 2-1472641949

Thanks again for your test.

Looks it is caused by another bug in sysfs_readdir:  if filldir() returns
failure(such as small buffer length passed from userspace, very probably
for trinity) in case of 'if (filp->f_pos ==  0 or 1)',
filp->private_data still will
point to one refcount-balanced sysfs_dirent object.

V4 adds fix for this situation, please test attachment v4 patch.

Thanks,
-- 
Ming Lei

sysfs-fix-readdir-v4.patch
Description: Binary data

[PATCH 1/2] sysfs: fix race between readdir and lseek

2013-03-20 Thread Ming Lei

While readdir() is running, lseek() may set filp->f_pos as zero,
then may leave filp->private_data pointing to one sysfs_dirent
object without holding its reference counter, so the sysfs_dirent
object may be used after free in next readdir().

This patch holds inode->i_mutex to avoid the problem since
the lock is always held in readdir path.

Reported-by: Dave Jones 
Tested-by: Sasha Levin 
Cc: 
Signed-off-by: Ming Lei 
---
 fs/sysfs/dir.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 2fbdff6..c9e1660 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -1058,10 +1058,21 @@ static int sysfs_readdir(struct file * filp, void * 
dirent, filldir_t filldir)
return 0;
 }
 
+static loff_t sysfs_dir_llseek(struct file *file, loff_t offset, int whence)
+{
+   struct inode *inode = file_inode(file);
+   loff_t ret;
+
+   mutex_lock(&inode->i_mutex);
+   ret = generic_file_llseek(file, offset, whence);
+   mutex_unlock(&inode->i_mutex);
+
+   return ret;
+}
 
 const struct file_operations sysfs_dir_operations = {
.read   = generic_read_dir,
.readdir= sysfs_readdir,
.release= sysfs_dir_release,
-   .llseek = generic_file_llseek,
+   .llseek = sysfs_dir_llseek,
 };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] sysfs: handle failure path correctly for readdir()

2013-03-20 Thread Ming Lei

In case of 'if (filp->f_pos ==  0 or 1)' of sysfs_readdir(),
the failure from filldir() isn't handled, and the reference counter
of the sysfs_dirent object pointed by filp->private_data will be
released without clearing filp->private_data, so use after free
bug will be triggered later.

This patch returns immeadiately under the situation for fixing the bug,
and it is reasonable to return from readdir() when filldir() fails.

Reported-by: Dave Jones 
Tested-by: Sasha Levin 
Cc: 
Signed-off-by: Ming Lei 
---
 fs/sysfs/dir.c |4 
 1 file changed, 4 insertions(+)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index c9e1660..e145126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -1020,6 +1020,8 @@ static int sysfs_readdir(struct file * filp, void * 
dirent, filldir_t filldir)
ino = parent_sd->s_ino;
if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
+   else
+   return 0;
}
if (filp->f_pos == 1) {
if (parent_sd->s_parent)
@@ -1028,6 +1030,8 @@ static int sysfs_readdir(struct file * filp, void * 
dirent, filldir_t filldir)
ino = parent_sd->s_ino;
if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
+   else
+   return 0;
}
mutex_lock(&sysfs_mutex);
for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] sysfs: fix use after free in sysfs_readdir()

2013-03-20 Thread Ming Lei

Hi,

These two patches fix two bugs inside sysfs_readdir(), and both
the two bugs may cause use after free problem, which is reported
by Dave on trinity fuzz test on syscall.


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

2013-03-20 Thread Ming Lei

On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan  wrote:
>
> In fact the same race exists between readdir() and read()/write()...

Fortunately, no read()/write() are implemented on sysfs directory, :-)

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] sysfs: fix race between readdir and lseek

2013-03-20 Thread Ming Lei

On Thu, Mar 21, 2013 at 11:28 AM, Li Zefan  wrote:
> On 2013/3/21 11:17, Ming Lei wrote:
>> On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan  wrote:
>>>
>>> In fact the same race exists between readdir() and read()/write()...
>>
>> Fortunately, no read()/write() are implemented on sysfs directory, :-)
>>
>
> That's irrelevant...

As far as sysfs is concerned, the filp->f_ops can't be changed in
read/write path.

>
> See my report:
>
> https://patchwork.kernel.org/patch/2160771/

Yes, I know there might be some mess after the commit ef3d0fd2
(vfs: do (nearly) lockless generic_file_llseek).

Also looks it has been stated in Documentation/filesystems/Locking:

->llseek() locking has moved from llseek to the individual llseek
implementations.  If your fs is not using generic_file_llseek, you
need to acquire and release the appropriate locks in your ->llseek().
For many filesystems, it is probably safe to acquire the inode
mutex or just to use i_size_read() instead.
Note: this does not protect the file->f_pos against concurrent modifications
since this is something the userspace has to take care about.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

2013-03-21 Thread Ming Lei

Hi Frank,

On Thu, Mar 21, 2013 at 11:29 AM, Frank Rowand  wrote:
>
> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
> yet because my config fails to build on 3.9-rc1 and 3.9-rc2.  I'll try
> to work on that issue tomorrow.

I play upstream kernel on Pandaboard A1 frequently, looks not
see the failure problem before. Maybe the problem is config dependent.

If you may share your config file, I'd like to do the test too.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sysfs: check if one entry has been removed before freeing

2013-04-02 Thread Ming Lei

It might be a kernel disaster if one sysfs entry is freed but
still referenced by sysfs tree.

Recently Dave and Sasha reported one use-after-free problem on
sysfs entry, and the problem has been troubleshooted with help
of debug message added in this patch.

Given sysfs_get_dirent/sysfs_put are exported APIs, even inside
sysfs they are called in many contexts(kobject/attribe add/delete,
inode init/drop, dentry lookup/release, readdir, ...), it is healthful
to check the removed flag before freeing one entry and dump message
if it is freeing without being removed first.

Cc: Dave Jones 
Cc: Sasha Levin 
Signed-off-by: Ming Lei 
---
 fs/sysfs/dir.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 1bf016b..328ef9b 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 */
parent_sd = sd->s_parent;
 
+   if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) {
+   printk(KERN_ERR "sysfs: free using entry: %s/%s\n",
+   parent_sd ? parent_sd->s_name : "",
+   sd->s_name);
+   BUG();
+   }
+
if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
sysfs_put(sd->s_symlink.target_sd);
if (sysfs_type(sd) & SYSFS_COPY_NAME)
@@ -386,7 +393,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, 
umode_t mode, int type)
 
sd->s_name = name;
sd->s_mode = mode;
-   sd->s_flags = type;
+   sd->s_flags = type | SYSFS_FLAG_REMOVED;
 
return sd;
 
@@ -466,6 +473,9 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct 
sysfs_dirent *sd)
ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
}
 
+   /* Mark the entry added into directory tree */
+   sd->s_flags &= ~SYSFS_FLAG_REMOVED;
+
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysfs: check if one entry has been removed before freeing

2013-04-02 Thread Ming Lei

On Wed, Apr 3, 2013 at 11:04 AM, Dave Jones  wrote:
> On Wed, Apr 03, 2013 at 10:58:23AM +0800, Ming Lei wrote:
>
>  > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
>  > index 1bf016b..328ef9b 100644
>  > --- a/fs/sysfs/dir.c
>  > +++ b/fs/sysfs/dir.c
>  > @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
>  >   */
>  >  parent_sd = sd->s_parent;
>  >
>  > +if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) {
>  > +printk(KERN_ERR "sysfs: free using entry: %s/%s\n",
>  > +parent_sd ? parent_sd->s_name : "",
>  > +sd->s_name);
>  > +BUG();
>  > +}
>
> Please use WARN instead of BUG.  For an in-ram filesystem like
> sysfs, there's no real reason to lock-up the machine in this way
> making it harder to debug.

If WARN is used, the freed memory will be allocated to other
kernel components, then sysfs may change the memory and cause
destruction, so maybe it is better to use BUG to stop kernel.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysfs: check if one entry has been removed before freeing

2013-04-03 Thread Ming Lei

On Wed, Apr 3, 2013 at 1:35 PM, Greg Kroah-Hartman
 wrote:
> On Wed, Apr 03, 2013 at 11:52:39AM +0800, Ming Lei wrote:
>> On Wed, Apr 3, 2013 at 11:04 AM, Dave Jones  wrote:
>> > On Wed, Apr 03, 2013 at 10:58:23AM +0800, Ming Lei wrote:
>> >
>> >  > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
>> >  > index 1bf016b..328ef9b 100644
>> >  > --- a/fs/sysfs/dir.c
>> >  > +++ b/fs/sysfs/dir.c
>> >  > @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
>> >  >   */
>> >  >  parent_sd = sd->s_parent;
>> >  >
>> >  > +if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) {
>> >  > +printk(KERN_ERR "sysfs: free using entry: %s/%s\n",
>> >  > +parent_sd ? parent_sd->s_name : "",
>> >  > +sd->s_name);
>> >  > +BUG();
>> >  > +}
>> >
>> > Please use WARN instead of BUG.  For an in-ram filesystem like
>> > sysfs, there's no real reason to lock-up the machine in this way
>> > making it harder to debug.
>>
>> If WARN is used, the freed memory will be allocated to other
>> kernel components, then sysfs may change the memory and cause
>> destruction, so maybe it is better to use BUG to stop kernel.
>
> No, it's never ok to call BUG(), sorry, please fix this.

Sorry, could you explain it in a bit detail? IMO, it is really a bug
when code runs here, and there are much similar BUG_ON()
uses in current sysfs code too.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one

2013-04-03 Thread Ming Lei

Hi,

On Thu, Apr 4, 2013 at 12:19 AM, Russell King - ARM Linux
 wrote:
>
> Still not convinced this is a proper fix.  Look, the problem is this:
>
> - Initially, set the CPU percpu offset to zero.  This means the boot
>   CPU reads and writes to the percpu data section in the kernel image.
>
> - The percpu areas are initialized, and the percpu data copied to each
>   percpu data - this will have any writes from the boot CPU included as
>   changes to the percpu data.
>
> - The boot CPU continues to read/write to the percpu data section.

No, the boot CPU switches to access percpu area after the area is
created(after setup_per_cpu_areas() and smp_prepare_boot_cpu()).

>
> - If the boot CPU suspends/resumes, cpu_init() gets called, which will
>   call set_my_cpu_offset(per_cpu_offset(cpu)); for the boot CPU.
>
> - The boot CPU now uses the allocated percpu data section and any
>   updates it did in the percpu data section in the kernel image are
>   lost to it.
>
> Your patch may be right on its own to solve the initial problem, but
> it leaves a _big_ hole.

Without the patch, looks the hols was still here, at least before commit
14318efb(ARM: 7587/1: implement optimized percpu variable access),
the per_cpu_offset() always return 0 before percpu area is created on ARM.
Same on other ARCHs too.

>
> Now, the big question here: is it right that the boot CPU should ever
> write to the static percpu data section in the kernel image?  What if

IMO, it isn't right, but as Tejun said, "it mostly due to historical
reasons rather than by design, as long as the data is in consistent
state by and during percpu setup, nothing will break."

As far as the lockdep case, it is a bit special since lock can
be used very early, and surely more early than creating percpu
area.

> there's a pointer in there, initially NULL, which then gets checked
> by each CPU and initialized if NULL - we'll end up sharing the same
> allocation amongst all CPUs, which probably isn't what was intended.
> If there's a list_head which gets added to, that too will be very bad.

Once lockstat is disabled, arm can boot well, that means no others might
access percpu data section early, so could we just treat it as a special case?

>
> Although you have uncovered a problem, I still think by setting the
> offset to zero initially, you're just papering over a much bigger
> can of worms.
>
> So, should percpu data be used this early in boot before the percpu
> stuff is properly initialized?  That feels _extremely_ unsafe.
>
> This, I think, needs to be addressed properly.  And part of that is
> knowing where things went wrong.  Will Deacon asked you for a backtrace
> showing where this problem occured.  Your response seems to be to
> resend the patch with a "v1" tag a no new information.

I have explained to Will Deacon, and looks no oops trace is generated
on the memory access exception during booting. And the hang
happened in mutex_unlock(&cgroup_mutex)(cgroup_init_subsys<-
cgroup_init_early).

> Sorry, not applying this until the above issue has been discussed
> and the location of these percpu accesses has been properly analysed.

No problem.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysfs: check if one entry has been removed before freeing

2013-04-04 Thread Ming Lei

On Thu, Apr 4, 2013 at 12:08 AM, Greg Kroah-Hartman
 wrote:
>
> Then make it a WARN() call, like David said, to give us a chance to get
> the report from a user so we can fix it.  If the machine crashes after
> that, fine, but hopefully we will get a oops report out of it.

OK, got it, and console might not be available at that time.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1] sysfs: check if one entry has been removed before freeing

2013-04-04 Thread Ming Lei

It might be a kernel disaster if one sysfs entry is freed but
still referenced by sysfs tree.

Recently Dave and Sasha reported one use-after-free problem on
sysfs entry, and the problem has been troubleshooted with help
of debug message added in this patch.

Given sysfs_get_dirent/sysfs_put are exported APIs, even inside
sysfs they are called in many contexts(kobject/attribe add/delete,
inode init/drop, dentry lookup/release, readdir, ...), it is healthful
to check the removed flag before freeing one entry and dump message
if it is freeing without being removed first.

Cc: Dave Jones 
Cc: Sasha Levin 
Signed-off-by: Ming Lei 
---
v1:
- use WARN() instead of BUG() as suggested by Dave and Greg

 fs/sysfs/dir.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 1bf016b..e8e0e71 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -268,6 +268,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 */
parent_sd = sd->s_parent;
 
+   WARN(!(sd->s_flags & SYSFS_FLAG_REMOVED),
+   "sysfs: free using entry: %s/%s\n",
+   parent_sd ? parent_sd->s_name : "", sd->s_name);
+
if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
sysfs_put(sd->s_symlink.target_sd);
if (sysfs_type(sd) & SYSFS_COPY_NAME)
@@ -386,7 +390,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, 
umode_t mode, int type)
 
sd->s_name = name;
sd->s_mode = mode;
-   sd->s_flags = type;
+   sd->s_flags = type | SYSFS_FLAG_REMOVED;
 
return sd;
 
@@ -466,6 +470,9 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct 
sysfs_dirent *sd)
ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
}
 
+   /* Mark the entry added into directory tree */
+   sd->s_flags &= ~SYSFS_FLAG_REMOVED;
+
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-29 Thread Ming Lei

e {
> -   ret = _request_firmware_load(fw_priv, true,
> -   firmware_loading_timeout());
> -   usermodehelper_read_unlock();
> -   }
> -   if (ret)
> -   _request_firmware_cleanup(firmware_p);
> -
> -   return ret;
> +   return _request_firmware(firmware_p, name, device, true, false);
>  }
>
>  /**
> @@ -1046,31 +1098,11 @@ static void request_firmware_work_func(struct 
> work_struct *work)
>  {
> struct firmware_work *fw_work;
> const struct firmware *fw;
> -   struct firmware_priv *fw_priv;
> -   long timeout;
> -   int ret;
>
> fw_work = container_of(work, struct firmware_work, work);
> -   fw_priv = _request_firmware_prepare(&fw, fw_work->name, 
> fw_work->device,
> -   fw_work->uevent, true);
> -   if (IS_ERR_OR_NULL(fw_priv)) {
> -   ret = PTR_RET(fw_priv);
> -   goto out;
> -   }
> -
> -   timeout = usermodehelper_read_lock_wait(firmware_loading_timeout());
> -   if (timeout) {
> -   ret = _request_firmware_load(fw_priv, fw_work->uevent, 
> timeout);
> -   usermodehelper_read_unlock();
> -   } else {
> -   dev_dbg(fw_work->device, "firmware: %s loading timed out\n",
> -   fw_work->name);
> -   ret = -EAGAIN;
> -   }
> -   if (ret)
> -   _request_firmware_cleanup(&fw);
>
> - out:
> +   _request_firmware(&fw, fw_work->name, fw_work->device,
> + fw_work->uevent, true);
> fw_work->cont(fw, fw_work->context);
> put_device(fw_work->device);
>

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-29 Thread Ming Lei

   }
> -
> -   /* pass the pages buffer to driver at the last minute */
> -   fw_set_page_data(buf, fw_priv->fw);
> -
> fw_priv->buf = NULL;
> -   mutex_unlock(&fw_lock);
> -
> -   if (direct_load)
> -   goto err_put_dev;
>
> device_remove_file(f_dev, &dev_attr_loading);
>  err_del_bin_attr:
> @@ -972,6 +965,77 @@ err_put_dev:
> return retval;
>  }
>
> +static int fw_load_from_user_helper(struct firmware *firmware,
> +   const char *name, struct device *device,
> +   bool uevent, bool nowait)
> +{
> +   struct firmware_priv *fw_priv;
> +   long timeout;
> +   int ret;
> +
> +   fw_priv = fw_create_instance(firmware, name, device, uevent, nowait);
> +   if (IS_ERR(fw_priv))
> +   return PTR_ERR(fw_priv);
> +
> +   fw_priv->buf = firmware->priv;
> +
> +   timeout = firmware_loading_timeout();
> +   if (nowait) {
> +   timeout = usermodehelper_read_lock_wait(timeout);
> +   if (!timeout) {
> +   dev_dbg(device, "firmware: %s loading timed out\n",
> +   name);
> +   kfree(fw_priv);
> +   return -EAGAIN;
> +   }
> +   } else {
> +   ret = usermodehelper_read_trylock();
> +   if (WARN_ON(ret)) {
> +   dev_err(device, "firmware: %s will not be loaded\n",
> +   name);
> +   kfree(fw_priv);
> +   return ret;
> +   }
> +   }

The above usermodehelper_read_lock thing may be a functional change,
and looks not what you claimed in commit log, :-). The lock is currently held in
direct loading case, but your patch change the rule. Without holding the lock,
request_firmware() may touch filesystem / storage too early during
kernel boot or system resume in direct loading case.

> +
> +   ret = _request_firmware_load(fw_priv, uevent, timeout);
> +   usermodehelper_read_unlock();
> +   return ret;
> +}
> +
> +/* called from request_firmware() and request_firmware_work_func() */
> +static int
> +_request_firmware(const struct firmware **firmware_p, const char *name,
> + struct device *device, bool uevent, bool nowait)
> +{
> +   struct firmware *fw;
> +   int ret;
> +
> +   if (!firmware_p)
> +   return -EINVAL;
> +
> +   ret = _request_firmware_prepare(&fw, name, device);
> +   if (ret <= 0) /* error or already assigned */
> +   goto out;
> +
> +   ret = 0;
> +   if (!fw_get_filesystem_firmware(device, fw->priv))
> +   ret = fw_load_from_user_helper(fw, name, device,
> +  uevent, nowait);
> +
> +   if (!ret)
> +   ret = assign_firmware_buf(fw, device);
> +
> + out:
> +   if (ret < 0) {
> +   release_firmware(fw);
> +   fw = NULL;
> +   }
> +
> +   *firmware_p = fw;
> +   return ret;
> +}
> +
>  /**
>   * request_firmware: - send firmware request and wait for it
>   * @firmware_p: pointer to firmware image
> @@ -996,26 +1060,7 @@ int
>  request_firmware(const struct firmware **firmware_p, const char *name,
>   struct device *device)
>  {
> -   struct firmware_priv *fw_priv;
> -   int ret;
> -
> -   fw_priv = _request_firmware_prepare(firmware_p, name, device, true,
> -   false);
> -   if (IS_ERR_OR_NULL(fw_priv))
> -   return PTR_RET(fw_priv);
> -
> -   ret = usermodehelper_read_trylock();
> -   if (WARN_ON(ret)) {
> -   dev_err(device, "firmware: %s will not be loaded\n", name);
> -   } else {
> -   ret = _request_firmware_load(fw_priv, true,
> -   firmware_loading_timeout());
> -   usermodehelper_read_unlock();
> -   }
> -   if (ret)
> -   _request_firmware_cleanup(firmware_p);
> -
> -   return ret;
> +   return _request_firmware(firmware_p, name, device, true, false);
>  }
>
>  /**
> @@ -1046,33 +1091,12 @@ static void request_firmware_work_func(struct 
> work_struct *work)
>  {
> struct firmware_work *fw_work;
> const struct firmware *fw;
> -   struct firmware_priv *fw_priv;
> -   long timeout;
> -   int ret;
>
> fw_work = container_of(work, struct firmware_work, work);
> -   fw_priv = _request_firmware_prepare(&fw, fw_work->name, 
> fw_work->device,
> -   fw_work->uevent, true);
> -   if (IS_ERR_OR_NULL(fw_priv)) {
> -   ret = PTR_RET(fw_priv);
> -   goto out;
> -   }
>
> -   timeout = usermodehelper_read_lock_wait(firmware_loading_timeout());
> -   if (timeout) {
> -   ret = _request_firmware_load(fw_priv, fw_work->uevent, 
> timeout);
> -   usermodehelper_read_unlock();
> -   } else {
> -   dev_dbg(fw_work->device, "firmware: %s loading timed out\n",
> -   fw_work->name);
> -   ret = -EAGAIN;
> -   }
> -   if (ret)
> -   _request_firmware_cleanup(&fw);
> -
> - out:
> +   _request_firmware(&fw, fw_work->name, fw_work->device,
> + fw_work->uevent, true);
> fw_work->cont(fw, fw_work->context);
> -   put_device(fw_work->device);

The above put_device is the pair of get_device inside request_firmware_nowait(),
I am wondering why you think it is not balanced, and to be removed . Did you
observe a double free?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-30 Thread Ming Lei

On Wed, Jan 30, 2013 at 3:17 PM, Takashi Iwai  wrote:
>> The above usermodehelper_read_lock thing may be a functional change,
>> and looks not what you claimed in commit log, :-). The lock is currently 
>> held in
>> direct loading case, but your patch change the rule. Without holding the 
>> lock,
>> request_firmware() may touch filesystem / storage too early during
>> kernel boot or system resume in direct loading case.
>
> Does it really happen in a real scenario?

Some crazy drivers might call request_firmware inside resume callback,
with usermodehelper_read_lock, we can find the mistake easily and log it.

>
> If so, using usermode helper lock for that purpose sounds like an
> abuse to be fixed differently or replaced with a better one.

Might be, but looks not good to introduce this change in a code
refactoring patch. Or you can do it in another patch for discussion
if you have better way to handle the situation.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-30 Thread Ming Lei

On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai  wrote:
>
> But it's supposed to be cached, no?

Generally it will be cached, but some crazy devices might come as new
device during resume, so we still need to handle the situation.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-30 Thread Ming Lei

On Wed, Jan 30, 2013 at 6:53 PM, Takashi Iwai  wrote:
> At Wed, 30 Jan 2013 18:50:05 +0800,
> Ming Lei wrote:
>>
>> On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai  wrote:
>> >
>> > But it's supposed to be cached, no?
>>
>> Generally it will be cached, but some crazy devices might come as new
>> device during resume, so we still need to handle the situation.
>
> In that case, shouldn't we simply return an error instead of
> usermodehelper lock (at least for direct loading)?

The check depends on usermodehelper_read_trylock now, is there
other simpler way to do the check?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-30 Thread Ming Lei

On Wed, Jan 30, 2013 at 7:08 PM, Takashi Iwai  wrote:
> At Wed, 30 Jan 2013 11:53:14 +0100,
> Takashi Iwai wrote:
>>
>> At Wed, 30 Jan 2013 18:50:05 +0800,
>> Ming Lei wrote:
>> >
>> > On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai  wrote:
>> > >
>> > > But it's supposed to be cached, no?
>> >
>> > Generally it will be cached, but some crazy devices might come as new
>> > device during resume, so we still need to handle the situation.
>>
>> In that case, shouldn't we simply return an error instead of
>> usermodehelper lock (at least for direct loading)?
>
> The patch below is what I have in my mind...
>
>
> Takashi
>
> ---
> From: Takashi Iwai 
> Subject: [PATCH] firmware: Skip usermodehelper_lock for direct loading
>
> We don't need a user mode helper lock for the direct fw loading.
> This also reduces the use of timeout when user mode helper is
> disabled, thus we can move the code into ifdef block, too.
>
> For avoiding the possible call of request_firmware() without caching,
> add a check of suspend state for the direct loading case, and returns
> immediately if it's called during suspend/resume.  Then it proceeds to
> the user mode helper if enabled, or returns the error.
>
> Signed-off-by: Takashi Iwai 
> ---
>  drivers/base/firmware_class.c | 97 
> ---
>  1 file changed, 54 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index f1caad7..c973bb0 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -88,13 +88,6 @@ enum {
> FW_STATUS_ABORT,
>  };
>
> -static int loading_timeout = 60;   /* In seconds */
> -
> -static inline long firmware_loading_timeout(void)
> -{
> -   return loading_timeout > 0 ? loading_timeout * HZ : 
> MAX_SCHEDULE_TIMEOUT;
> -}
> -
>  struct firmware_cache {
> /* firmware_buf instance will be added into the below list */
> spinlock_t lock;
> @@ -315,6 +308,9 @@ static bool fw_get_filesystem_firmware(struct device 
> *device,
> bool success = false;
> char *path = __getname();
>
> +   if (device->power.is_suspended)
> +   return false;

The device which is requesting firmware is resumed  does not mean the
storage device has been resumed, so this check isn't enough.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Page allocation failure on v3.8-rc5

2013-01-30 Thread Ming Lei

sprintf(state->name, "p");

-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(&state->parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct
block_device *bdev)
printk(KERN_INFO "%s", state->pp_buf);

free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   release_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..8323808 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
        int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };

+extern void release_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);



Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code

2013-01-31 Thread Ming Lei

 }
> -
> -   /* pass the pages buffer to driver at the last minute */
> -   fw_set_page_data(buf, fw_priv->fw);
> -
> fw_priv->buf = NULL;
> -   mutex_unlock(&fw_lock);
> -
> -   if (direct_load)
> -   goto err_put_dev;
>
> device_remove_file(f_dev, &dev_attr_loading);
>  err_del_bin_attr:
> @@ -972,6 +965,74 @@ err_put_dev:
> return retval;
>  }
>
> +static int fw_load_from_user_helper(struct firmware *firmware,
> +   const char *name, struct device *device,
> +   bool uevent, bool nowait, long timeout)
> +{
> +   struct firmware_priv *fw_priv;
> +
> +   fw_priv = fw_create_instance(firmware, name, device, uevent, nowait);
> +   if (IS_ERR(fw_priv))
> +   return PTR_ERR(fw_priv);
> +
> +   fw_priv->buf = firmware->priv;
> +   return _request_firmware_load(fw_priv, uevent, timeout);
> +}
> +
> +/* called from request_firmware() and request_firmware_work_func() */
> +static int
> +_request_firmware(const struct firmware **firmware_p, const char *name,
> + struct device *device, bool uevent, bool nowait)
> +{
> +   struct firmware *fw;
> +   long timeout;
> +   int ret;
> +
> +   if (!firmware_p)
> +   return -EINVAL;
> +
> +   ret = _request_firmware_prepare(&fw, name, device);
> +   if (ret <= 0) /* error or already assigned */
> +   goto out;
> +
> +   ret = 0;
> +   timeout = firmware_loading_timeout();

The above line may be moved below the line of 'if (nowait) '.

> +   if (nowait) {
> +   timeout = usermodehelper_read_lock_wait(timeout);
> +   if (!timeout) {
> +   dev_dbg(device, "firmware: %s loading timed out\n",
> +   name);
> +   ret = -EBUSY;
> +   }
> +   } else {
> +   ret = usermodehelper_read_trylock();
> +   if (WARN_ON(ret)) {
> +   dev_err(device, "firmware: %s will not be loaded\n",
> +   name);
> +   }
> +   }
> +
> +   if (!ret) {
> +   if (!fw_get_filesystem_firmware(device, fw->priv))
> +   ret = fw_load_from_user_helper(fw, name, device,
> +  uevent, nowait,
> +  timeout);
> +   if (!ret)
> +   ret = assign_firmware_buf(fw, device);
> +   }
> +
> +   usermodehelper_read_unlock();

The above should be move inside the above 'if (!ret) {...}'

Except for the above two comments, looks the patch v4 is fine.

Please CC greg-kh when you send v5, and you may add my ack
for the patchset.

Acked-by: Ming Lei 

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] firmware: Make user-mode helper optional (v5)

2013-01-31 Thread Ming Lei

On Fri, Feb 1, 2013 at 12:15 AM, Greg Kroah-Hartman
 wrote:
> On Thu, Jan 31, 2013 at 11:13:53AM +0100, Takashi Iwai wrote:
>> Hi,
>>
>> this is a series of revised patches for making the user-mode helper
>> call in the firmware loader optional.  Hopefully all things settled
>> down now.
>>
>> v4->v5: fix invalid call of usermodehelper_unlock()
>
> Yeah!  I was working on doing just this right as you sent this series
> in.  I'll take your patches instead of mine, you did all of the hard
> work here, thanks :)
>
> Ming, any objection to me taking them?

No objections, and I have reviewed the patch set and acked them all.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Page allocation failure on v3.8-rc5

2013-01-31 Thread Ming Lei

On Fri, Feb 1, 2013 at 7:43 AM, Andrew Morton  wrote:
> On Wed, 30 Jan 2013 19:53:22 +0800
> Ming Lei  wrote:
>
>> The allocation failure is caused by the big sizeof(struct parsed_partitions),
>> which is 64K in my 32bit box,
>
> Geeze.
>
> We could fix that nicely by making parsed_partitions.parts an array of
> pointers to a single `struct parsed_partition' and allocating those
> on-demand.
>
> But given the short-lived nature of this storage and the infrequency of
> check_partition(), that isn't necessary.
>
>> could you test the blow patch to see
>> if it can fix the allocation failure?
>
> (The patch is wordwrapped)

Sorry for that, I send out it for test.

>
>> ...
>>
>> @@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) 
>> = {
>>   NULL
>>  };
>>
>> +struct parsed_partitions *allocate_partitions(int nr)
>> +{
>> + struct parsed_partitions *state;
>> +
>> + state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
>
> I personally prefer sizefo(*state) here.  It means the reader doesn't
> have to scroll back to check things.

OK, will use sizeof(*state).

>> + if (!state)
>> + return NULL;
>> +
>> + state->parts = vzalloc(nr * sizeof(state->parts[0]));
>> + if (!state->parts) {
>> + kfree(state);
>> + return NULL;
>> + }
>
> It doesn't really need to be this complex - we could just vmalloc the
> entire `struct parsed_partitions'.  But I see that your change will

The above approach can save one 32K allocation approximately.

> cause us to allcoate much less memory in many situations, which is
> good.  It should be mentioned in the changelog!

OK, I will add the changelog later.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] block: partition: optimize memory allocation in check_partition

2013-01-31 Thread Ming Lei

Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
so it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.

This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:

- make parsed_partitions.parts as pointer so that the pointed memory
can fit in 32KB buffer, then approximate 32KB memory can be saved

- vmalloc the buffer pointed by parsed_partitions.parts because
32KB is still a bit big for kmalloc

- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions

Cc: sta...@vger.kernel.org
Signed-off-by: Ming Lei 
---
 block/partition-generic.c |2 +-
 block/partitions/check.c  |   35 ++-
 block/partitions/check.h  |4 +++-
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index f1d1451..043d0bd 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -525,7 +525,7 @@ rescan:
md_autodetect_dev(part_to_dev(part)->devt);
 #endif
}
-   kfree(state);
+   release_partitions(state);
return 0;
 }
 
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..86c9450 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) = {
NULL
 };
 
+static struct parsed_partitions *allocate_partitions(int nr)
+{
+   struct parsed_partitions *state;
+
+   state = kzalloc(sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   state->parts = vzalloc(nr * sizeof(state->parts[0]));
+   if (!state->parts) {
+   kfree(state);
+   return NULL;
+   }
+
+   return state;
+}
+
+void release_partitions(struct parsed_partitions *state)
+{
+   vfree(state->parts);
+   kfree(state);
+}
+
 struct parsed_partitions *
 check_partition(struct gendisk *hd, struct block_device *bdev)
 {
struct parsed_partitions *state;
int i, res, err;
 
-   state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
+   i = disk_max_parts(hd);
+   state = allocate_partitions(i);
if (!state)
return NULL;
+   state->limit = i;
state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
if (!state->pp_buf) {
-   kfree(state);
+   release_partitions(state);
return NULL;
}
state->pp_buf[0] = '\0';
@@ -128,10 +154,9 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
 
-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(&state->parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
printk(KERN_INFO "%s", state->pp_buf);
 
free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   release_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..4fefdbb 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };
 
+void release_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: partition: optimize memory allocation in check_partition

2013-01-31 Thread Ming Lei

On Fri, Feb 1, 2013 at 11:20 AM, Joe Perches  wrote:
>> @@ -15,13 +15,15 @@ struct parsed_partitions {
>>   int flags;
>>   bool has_info;
>>   struct partition_meta_info info;
>> - } parts[DISK_MAX_PARTS];
>> + } *parts;
>
> This is relatively unusual style.
> Could you break out this struct instead?

IMO, looks it isn't necessary since no one uses the type of the embedded
structure directly, and the style isn't introduced by this patch, which only
follows the previous one.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: partition: optimize memory allocation in check_partition

2013-02-01 Thread Ming Lei

On Fri, Feb 1, 2013 at 4:33 PM, Yasuaki Ishimatsu
 wrote:

> At least, you should use release_partitions() instead of kfree() here.

Good catch, thank you for pointing it out, and I will post v1 later
with the update.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1] block: partition: optimize memory allocation in check_partition

2013-02-01 Thread Ming Lei

Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
so it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.

This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:

- make parsed_partitions.parts as pointer so that the pointed memory
can fit in 32KB buffer, then approximate 32KB memory can be saved

- vmalloc the buffer pointed by parsed_partitions.parts because
32KB is still a bit big for kmalloc

- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions

Cc: sta...@vger.kernel.org
Signed-off-by: Ming Lei 
---
v1:
- add one missing free_partitions
- rename release_partitions as free_partitions

 block/partition-generic.c |4 ++--
 block/partitions/check.c  |   35 ++-
 block/partitions/check.h  |4 +++-
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index f1d1451..17d44ea 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
block_device *bdev)
int p, highest, res;
 rescan:
if (state && !IS_ERR(state)) {
-   kfree(state);
+   free_partitions(state);
state = NULL;
}
 
@@ -525,7 +525,7 @@ rescan:
md_autodetect_dev(part_to_dev(part)->devt);
 #endif
}
-   kfree(state);
+   free_partitions(state);
return 0;
 }
 
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..944fe22 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) = {
NULL
 };
 
+static struct parsed_partitions *allocate_partitions(int nr)
+{
+   struct parsed_partitions *state;
+
+   state = kzalloc(sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   state->parts = vzalloc(nr * sizeof(state->parts[0]));
+   if (!state->parts) {
+   kfree(state);
+   return NULL;
+   }
+
+   return state;
+}
+
+void free_partitions(struct parsed_partitions *state)
+{
+   vfree(state->parts);
+   kfree(state);
+}
+
 struct parsed_partitions *
 check_partition(struct gendisk *hd, struct block_device *bdev)
 {
struct parsed_partitions *state;
int i, res, err;
 
-   state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
+   i = disk_max_parts(hd);
+   state = allocate_partitions(i);
if (!state)
return NULL;
+   state->limit = i;
state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
if (!state->pp_buf) {
-   kfree(state);
+   free_partitions(state);
return NULL;
}
state->pp_buf[0] = '\0';
@@ -128,10 +154,9 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
 
-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(&state->parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
printk(KERN_INFO "%s", state->pp_buf);
 
free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   free_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..eade17e 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };
 
+void free_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firmware: Make user-mode helper optional

2013-01-24 Thread Ming Lei

On Tue, Jan 22, 2013 at 7:51 PM, Takashi Iwai  wrote:
> Since 3.7 kernel, the firmware loader can read the firmware files
> directly, and the traditional user-mode helper is invoked only as a
> fallback.  This seems working pretty well, and the next step would be
> to reduce the redundant user-mode helper stuff in future.

The idea is good.

>
> This patch is a preparation for that, to make the user-mode helper
> invocation optional via kconfig.  Other than that, there is no
> functional change here.

In fact, your patch does much code refactoring work, which isn't
strictly related with what you claimed in the commit log, at least
it should be separated from introducing FW_LOADER_USER_HELPER.

So could you split the patch into several parts and let one part do one thing
so that we can review them a bit easily?

Also it is not good to introduce too many '#ifdef #endif', IMO.

>
> Signed-off-by: Takashi Iwai 
> ---
>  drivers/base/Kconfig  |  11 ++
>  drivers/base/firmware_class.c | 350 
> +-
>  2 files changed, 218 insertions(+), 143 deletions(-)
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index c8b4539..07abd9d 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -145,6 +145,17 @@ config EXTRA_FIRMWARE_DIR
>   this option you can point it elsewhere, such as /lib/firmware/ or
>   some other directory containing the firmware files.
>
> +config FW_LOADER_USER_HELPER
> +   bool "Fallback user-helper invocation for firmware loading"
> +   depends on FW_LOADER
> +   default y
> +   help
> + This option enables / disables the invocation of user-helper
> + (e.g. udev) for loading firmware files as a fallback after the
> + direct file loading in kernel fails.  The user-mode helper is
> + no longer required unless you have a special firmware file that
> + resides in a non-standard path.
> +
>  config DEBUG_DRIVER
> bool "Driver Core verbose debug messages"
> depends on DEBUG_KERNEL
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index b392b35..6c6f714 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -88,6 +88,7 @@ enum {
> FW_STATUS_ABORT,
>  };
>
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
>  enum fw_buf_fmt {
> VMALLOC_BUF,/* used in direct loading */
> PAGE_BUF,   /* used in loading via userspace */
> @@ -99,6 +100,7 @@ static inline long firmware_loading_timeout(void)
>  {
> return loading_timeout > 0 ? loading_timeout * HZ : 
> MAX_SCHEDULE_TIMEOUT;
>  }
> +#endif /* CONFIG_FW_LOADER_USER_HELPER */
>
>  struct firmware_cache {
> /* firmware_buf instance will be added into the below list */
> @@ -128,12 +130,14 @@ struct firmware_buf {
> struct completion completion;
> struct firmware_cache *fwc;
> unsigned long status;
> -   enum fw_buf_fmt fmt;
> void *data;
> size_t size;
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
> +   enum fw_buf_fmt fmt;
> struct page **pages;
> int nr_pages;
> int page_array_size;
> +#endif
> char fw_id[];
>  };
>
> @@ -142,6 +146,7 @@ struct fw_cache_entry {
> char name[];
>  };
>
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
>  struct firmware_priv {
> struct delayed_work timeout_work;
> bool nowait;
> @@ -149,6 +154,7 @@ struct firmware_priv {
> struct firmware_buf *buf;
> struct firmware *fw;
>  };
> +#endif
>
>  struct fw_name_devm {
> unsigned long magic;
> @@ -182,7 +188,9 @@ static struct firmware_buf *__allocate_fw_buf(const char 
> *fw_name,
> strcpy(buf->fw_id, fw_name);
> buf->fwc = fwc;
> init_completion(&buf->completion);
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
> buf->fmt = VMALLOC_BUF;
> +#endif
>
> pr_debug("%s: fw-%s buf=%p\n", __func__, fw_name, buf);
>
> @@ -240,7 +248,6 @@ static void __fw_free_buf(struct kref *ref)
>  {
> struct firmware_buf *buf = to_fwbuf(ref);
> struct firmware_cache *fwc = buf->fwc;
> -   int i;
>
> pr_debug("%s: fw-%s buf=%p data=%p size=%u\n",
>  __func__, buf->fw_id, buf, buf->data,
> @@ -250,12 +257,15 @@ static void __fw_free_buf(struct kref *ref)
> spin_unlock(&fwc->lock);
>
>
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
> if (buf->fmt == PAGE_BUF) {
> +   int i;
> vunmap(buf->data);
> for (i = 0; i < buf->nr_pages; i++)
> __free_page(buf->pages[i]);
> kfree(buf->pages);
> } else
> +#endif
> vfree(buf->data);
> kfree(buf);
>  }
> @@ -319,7 +329,8 @@ static bool fw_read_file_contents(struct file *file, 
> struct firmware_buf *fw_buf
> return true;
>  }
>
> -static bool fw_get_filesystem_firmware(struct firmware_buf *buf)
> +static bool

lockdep: access percpu variable too early

2013-02-28 Thread Ming Lei

Hi,

The percpu variables of 'lockdep_stats' and 'cpu_lock_stat'
may be accessed before percpu area is brought up in case of
CONFIG_DEBUG_LOCKDEP/CONFIG_LOCK_STAT, so these variables
in non-boot CPUs will be initialized incorrectly.

As far as I think of, there are several solutions for the
problem:

1, just enable 'debug_locks' until percpu area is built
2, define the two percpu variables in a simple/stupid
percpu way inside lockdep
3, clear these two percpu variables of non-boot CPUs
just after percpu area is built.
4, ignore the problem

Looks each approache have its own disadvantage, so what
is your opinion on the problem? or better approach?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] block: partitions: mac: obey the state->limit constraint

2013-02-20 Thread Ming Lei

It isn't necessary to read the information of partitions whose No.
is equal and more than state->limit since only maximum state->limit
partitions will be added inside rescan_partitions().

That is also what other kind of partitions are doing.

Signed-off-by: Ming Lei 
---
 block/partitions/mac.c |4 
 1 file changed, 4 insertions(+)

diff --git a/block/partitions/mac.c b/block/partitions/mac.c
index 11f688b..76d8ba6 100644
--- a/block/partitions/mac.c
+++ b/block/partitions/mac.c
@@ -63,6 +63,10 @@ int mac_partition(struct parsed_partitions *state)
put_dev_sector(sect);
return 0;
}
+
+   if (blocks_in_map >= state->limit)
+   blocks_in_map = state->limit - 1;
+
strlcat(state->pp_buf, " [mac]", PAGE_SIZE);
for (slot = 1; slot <= blocks_in_map; ++slot) {
int pos = slot * secsize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] block: partition: optimize memory allocation in check_partition

2013-02-20 Thread Ming Lei

Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
so it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.

This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:

- make parsed_partitions.parts as pointer so that the pointed memory
can fit in 32KB buffer, then approximate 32KB memory can be saved

- vmalloc the buffer pointed by parsed_partitions.parts because
32KB is still a bit big for kmalloc

- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions always

Reported-by: Felipe Balbi 
Signed-off-by: Ming Lei 
---
 block/partition-generic.c |4 ++--
 block/partitions/check.c  |   37 -
 block/partitions/check.h  |4 +++-
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 1cb4dec..789cdea 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
block_device *bdev)
int p, highest, res;
 rescan:
if (state && !IS_ERR(state)) {
-   kfree(state);
+   free_partitions(state);
state = NULL;
}
 
@@ -525,7 +525,7 @@ rescan:
md_autodetect_dev(part_to_dev(part)->devt);
 #endif
}
-   kfree(state);
+   free_partitions(state);
return 0;
 }
 
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..19ba207 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) = {
NULL
 };
 
+static struct parsed_partitions *allocate_partitions(struct gendisk *hd)
+{
+   struct parsed_partitions *state;
+   int nr;
+
+   state = kzalloc(sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   nr = disk_max_parts(hd);
+   state->parts = vzalloc(nr * sizeof(state->parts[0]));
+   if (!state->parts) {
+   kfree(state);
+   return NULL;
+   }
+
+   state->limit = nr;
+
+   return state;
+}
+
+void free_partitions(struct parsed_partitions *state)
+{
+   vfree(state->parts);
+   kfree(state);
+}
+
 struct parsed_partitions *
 check_partition(struct gendisk *hd, struct block_device *bdev)
 {
struct parsed_partitions *state;
int i, res, err;
 
-   state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
+   state = allocate_partitions(hd);
if (!state)
return NULL;
state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
if (!state->pp_buf) {
-   kfree(state);
+   free_partitions(state);
return NULL;
}
state->pp_buf[0] = '\0';
@@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
 
-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(&state->parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
printk(KERN_INFO "%s", state->pp_buf);
 
free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   free_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..eade17e 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };
 
+void free_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one

2013-03-12 Thread Ming Lei

On Tue, Mar 12, 2013 at 6:56 PM, Russell King - ARM Linux
 wrote:
>>
>> Looks no one objects the patch, so I has submitted it into Russell's
>> patch system, and hope it can be pushed to linus tree soon and
>> make LOCK_STAT/DEBUG_LOCKDEP usable on ARMv7.
>
> I'm not convinced it is correct.  Is the percpu data as stored in the
> kernel image (in other words, at offset zero) supposed to be read only?

It should have been used after setup_per_cpu_areas().

> If so, the above means that we'll be accessing that rather than the
> copy of the percpu data we should be accessing.

I admit the patch is a work around for the problem, but it is harmless
to make lockdep workable on arm at least.

> The percpu data areas are allocated by setup_per_cpu_areas() - that's
> where we should be initializing this, just like it's done on PowerPC.

>From the entry of start_kernel to setup_per_cpu_areas, there are many
locks which will be acquired/released, so the percpu variable in lock_release
has to be used early now.  Either disabling lockdep during the period or
introducing stupid/simple percpu variable inside lockdep may fix the probem,
but looks both aren't perfect.

So the workaround is proposed in this patch...

Ingo and Peter, what is your opinion on the problem?

Thanks
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one

2013-03-12 Thread Ming Lei

On Tue, Mar 12, 2013 at 7:30 PM, Russell King - ARM Linux
 wrote:
>>
>> Ingo and Peter, what is your opinion on the problem?
>
> Having discussed this with Ben Herrenschmidt, it seems that we do need
> to have a more complex patch to sort this out - we need to setup our
> private pointer inside setup_per_cpu_areas().

Suppose so, seems the patch is still needed to make CPU0 see
static variables in '.data..percpu' section correctly.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one

2013-03-12 Thread Ming Lei

On Wed, Mar 13, 2013 at 1:25 AM, Tejun Heo  wrote:
> Hello,
>
> On Tue, Mar 12, 2013 at 07:44:55PM +0800, Ming Lei wrote:
>> On Tue, Mar 12, 2013 at 7:30 PM, Russell King - ARM Linux
>>  wrote:
>> >>
>> >> Ingo and Peter, what is your opinion on the problem?
>> >
>> > Having discussed this with Ben Herrenschmidt, it seems that we do need
>> > to have a more complex patch to sort this out - we need to setup our
>> > private pointer inside setup_per_cpu_areas().
>>
>> Suppose so, seems the patch is still needed to make CPU0 see
>> static variables in '.data..percpu' section correctly.
>
> If my memory serves me right, x86 also has places where CPU0 accesses
> its per-cpu data in .data.percpu.  While those existed (not sure
> they're still there but probably they're) mostly due to historical
> reasons rather than by design, as long as the data is in consistent
> state by and during percpu setup, nothing will break.

Tejun, thanks for your input, yes, nothing will break.  For lockdep,
the percpu variables in non-boot-CPUs may be initialized again after
percpu area is set up.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-13 Thread Ming Lei

On Thu, Mar 7, 2013 at 2:26 PM, Dave Jones  wrote:
> Could be some of those that caused these bugs.
>
> I just retried rerunning the test a few times.  Every time I run for a while
> I end up with different crashes.  It's raining bugs over here.
> (Here's another sysfs one below)
>
> Running 'trinity -c mkdirat -V /sys' doesn't seem to trigger it, so it's an
> interaction with something else maybe.

Maybe it is helpful for trinity to output some previous recorded operations
on /sysfs when the crash is triggered.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + atomic-improve-atomic_inc_unless_negative-atomic_dec_unless_positive .patch added to -mm tree

2013-03-14 Thread Ming Lei

On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov  wrote:
>> From: Ming Lei 
>> Subject: atomic: improve 
>> atomic_inc_unless_negative/atomic_dec_unless_positive
>>
>> Generally, both atomic_inc_unless_negative() and
>> atomic_dec_unless_positive() need at least two atomic_cmpxchg() to
>> complete the atomic operation.  In fact, the 1st atomic_cmpxchg() is just
>> used to read current value of the atomic variable at most times.
>
> Agreed, this looks ugly...
>
> But can't we make a simpler patch and keep the code simple/readable ?
>
> Oleg.
>
> --- x/include/linux/atomic.h
> +++ x/include/linux/atomic.h
> @@ -64,7 +64,7 @@ static inline int atomic_inc_not_zero_hi
>  static inline int atomic_inc_unless_negative(atomic_t *p)
>  {
> int v, v1;
> -   for (v = 0; v >= 0; v = v1) {
> +   for (v = atomic_read(p); v >= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v + 1);

Unfortunately, the above will exchange the current value even though
it is negative, so it isn't correct.

> if (likely(v1 == v))
> return 1;
> @@ -77,7 +77,7 @@ static inline int atomic_inc_unless_nega
>  static inline int atomic_dec_unless_positive(atomic_t *p)
>  {
> int v, v1;
> -   for (v = 0; v <= 0; v = v1) {
> +   for (v = atomic_read(p); v <= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v - 1);

Similar with above.

> if (likely(v1 == v))
> return 1;
>


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-15 Thread Ming Lei

Hi,

On Fri, Mar 15, 2013 at 1:04 PM, Sasha Levin  wrote:
> On 03/15/2013 12:03 AM, Sasha Levin wrote:
>>
>> [  350.140100] general protection fault:  [#1] PREEMPT SMP 
>> DEBUG_PAGEALLOC
>> [  350.141468] Dumping ftrace buffer:
>> [  350.142048](ftrace buffer empty)
>> [  350.142619] Modules linked in:
>> [  350.143128] CPU 0
>> [  350.143434] Pid: 25064, comm: trinity-child14 Tainted: GW
>> 3.9.0-rc2-next-20130314-sasha-00046-g3897511 #295
>> [  350.145415] RIP: 0010:[]  [] 
>> rb_next+0x23/0x60
>> [  350.146680] RSP: 0018:88007b9dde48  EFLAGS: 00010202
>> [  350.147528] RAX: 6b6b6b6b6b6b6b6b RBX: 8800b8524b70 RCX: 
>> 8800b8524b70
>> [  350.148738] RDX: 6b6b6b6b6b6b6b6b RSI: 8800b63b96e0 RDI: 
>> 8800b8524bb8
>> [  350.149939] RBP: 88007b9dde48 R08:  R09: 
>> 
>> [  350.150035] R10:  R11:  R12: 
>> 88008c5cb180
>> [  350.150035] R13:  R14:  R15: 
>> 0010
>> [  350.150035] FS:  7fec4eae2700() GS:8800bb80() 
>> knlGS:
>> [  350.150035] CS:  0010 DS:  ES:  CR0: 80050033
>> [  350.150035] CR2: 0001 CR3: 7c32d000 CR4: 
>> 000406f0
>> [  350.150035] DR0:  DR1:  DR2: 
>> 
>> [  350.150035] DR3:  DR6: 0ff0 DR7: 
>> 0400
>> [  350.150035] Process trinity-child14 (pid: 25064, threadinfo 
>> 88007b9dc000, task 880096413000)
>> [  350.150035] Stack:
>> [  350.150035]  88007b9ddeb8 812fa959  
>> 0008
>> [  350.150035]  293e 8128cca0 88007b9ddf28 
>> 8800b63b96e0
>> [  350.150035]  8800a14e9b78 88008c5cb180 88007b9ddf28 
>> 8128cca0
>> [  350.150035] Call Trace:
>> [  350.150035]  [] sysfs_readdir+0x219/0x280
>> [  350.150035]  [] ? filldir+0x100/0x100
>> [  350.150035]  [] ? filldir+0x100/0x100
>> [  350.150035]  [] vfs_readdir+0x78/0xc0
>> [  350.150035]  [] ? trace_hardirqs_on+0xd/0x10
>> [  350.150035]  [] SyS_getdents64+0x90/0x120
>> [  350.150035]  [] tracesys+0xe1/0xe6
>> [  350.150035] Code: 85 d2 75 f4 5d c3 66 90 55 31 c0 48 8b 17 48 89 e5 48 
>> 39 d7 74 4a 48 8b 47 08 48 85 c0 75 0c eb 17 0f 1f 80
>> 00 00 00 00 48 89 d0 <48> 8b 50 10 48 85 d2 75 f4 eb 2a 66 90 48 89 d1 48 83 
>> e1 fc 74
>> [  350.150035] RIP  [] rb_next+0x23/0x60
>> [  350.150035]  RSP 
>> [  350.179705] ---[ end trace a39f58a515b594d5 ]---
>
> And on the bright side, unlike in Dave's case, similar issues reproduce 
> rather easily
> over here:

Could you share how to reproduce that easily?

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + atomic-improve-atomic_inc_unless_negative-atomic_dec_unless_positive .patch added to -mm tree

2013-03-15 Thread Ming Lei

On Fri, Mar 15, 2013 at 9:46 PM, Oleg Nesterov  wrote:
> On 03/15, Ming Lei wrote:
>>
>> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov  wrote:
>> >  static inline int atomic_inc_unless_negative(atomic_t *p)
>> >  {
>> > int v, v1;
>> > -   for (v = 0; v >= 0; v = v1) {
>> > +   for (v = atomic_read(p); v >= 0; v = v1) {
>> > v1 = atomic_cmpxchg(p, v, v + 1);
>>
>> Unfortunately, the above will exchange the current value even though
>> it is negative, so it isn't correct.
>
> Hmm, why? We always check "v >= 0" before we try to do
> atomic_cmpxchg(old => v) ?

Sorry, yes, you are right. But then your patch is basically same with the
previous one, isn't it?  And has same problem, see below discussion:

http://marc.info/?t=13628436691&r=1&w=2


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-16 Thread Ming Lei

On Sat, Mar 16, 2013 at 8:39 PM, Hillf Danton  wrote:
> init rb node before use due to empty node checked by rb_next().
>
> --- a/fs/sysfs/dir.cSat Mar 16 20:12:16 2013
> +++ b/fs/sysfs/dir.cSat Mar 16 20:37:10 2013
> @@ -396,6 +396,7 @@ struct sysfs_dirent *sysfs_new_dirent(co
>
> atomic_set(&sd->s_count, 1);
> atomic_set(&sd->s_active, 0);
> +   rb_init_node(&sd->s_rb);

Looks there is no the symbol in linus tree, and not necessary since
rb_link_node() may initialize the node correctly.

Also I can't reproduce the problem on my Pandaboard with trinity, maybe
the below debug code can find the name of the affected node if anyone
would like to test it.

--
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 2fbdff6..9aa11c7 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -281,6 +281,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 */
parent_sd = sd->s_parent;

+   if(!(sd->s_flags & SYSFS_FLAG_REMOVED))
+   printk("%s sysfs_dirent use after free: %s-%s\n",
+   __func__, parent_sd->s_name, sd->s_name);
+
if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
sysfs_put(sd->s_symlink.target_sd);
if (sysfs_type(sd) & SYSFS_COPY_NAME)
@@ -962,6 +966,9 @@ static struct sysfs_dirent *sysfs_dir_pos(const void *ns,
int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
pos->s_parent == parent_sd &&
hash == pos->s_hash;
+   if (valid && (atomic_read(&pos->s_count) <= 1))
+   printk("%s sysfs_dirent use after free: %s-%s\n",
+   __func__, parent_sd->s_name, pos->s_name);
sysfs_put(pos);
if (!valid)
pos = NULL;


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-16 Thread Ming Lei

On Sat, Mar 16, 2013 at 11:07 PM, Sasha Levin  wrote:
>
> Hi Ming,
>
> With your patch:
>
>
> [ 1525.874312] release_sysfs_dirent sysfs_dirent use after free: ptysb-uevent

Sasha, thanks for your test.

So is the oops always triggered on this node of 'ptysb-uevent' or the node name
is changed every time?


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mm/fremap.c: fix another oops on error path

2013-03-16 Thread Ming Lei

Since find_vma() may return NULL, so don't dereference the
returned 'vma' until it is valid.

The problem is introduced by the commit in linus tree:

 6d7825b(mm/fremap.c: fix oops on error path).

Also mark vm_flags as ninitialized_var() to avoid compile
warning.

Cc: Tommi Rantala 
Cc: Michel Lespinasse 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
 mm/fremap.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/fremap.c b/mm/fremap.c
index 6a8da7e..80088e9 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -129,7 +129,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
unsigned long, size,
struct vm_area_struct *vma;
int err = -EINVAL;
int has_write_lock = 0;
-   vm_flags_t vm_flags;
+   vm_flags_t uninitialized_var(vm_flags);
 
if (prot)
return err;
@@ -163,8 +163,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
unsigned long, size,
 * and that the remapped range is valid and fully within
 * the single existing vma.
 */
-   vm_flags = vma->vm_flags;
-   if (!vma || !(vm_flags & VM_SHARED))
+   if (!vma || !(vma->vm_flags & VM_SHARED))
goto out;
 
if (!vma->vm_ops || !vma->vm_ops->remap_pages)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-16 Thread Ming Lei

On Sat, Mar 16, 2013 at 11:22 PM, Ming Lei  wrote:
> On Sat, Mar 16, 2013 at 11:07 PM, Sasha Levin  wrote:
>>
>> Hi Ming,
>>
>> With your patch:
>>
>>
>> [ 1525.874312] release_sysfs_dirent sysfs_dirent use after free: ptysb-uevent
>
> Sasha, thanks for your test.
>
> So is the oops always triggered on this node of 'ptysb-uevent' or the node 
> name
> is changed every time?

Also, we may dump stack on the release path with the below patch to see who
did the ugly free.

--
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 2fbdff6..993671d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -281,6 +281,12 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
 */
parent_sd = sd->s_parent;

+   if(!(sd->s_flags & SYSFS_FLAG_REMOVED)) {
+   printk("%s sysfs_dirent use after free: %s-%s\n",
+   __func__, parent_sd->s_name, sd->s_name);
+   dump_stack();
+   }
+
if (sysfs_type(sd) == SYSFS_KOBJ_LINK)
sysfs_put(sd->s_symlink.target_sd);
if (sysfs_type(sd) & SYSFS_COPY_NAME)
@@ -962,6 +968,9 @@ static struct sysfs_dirent *sysfs_dir_pos(const void *ns,
int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
pos->s_parent == parent_sd &&
hash == pos->s_hash;
+   if (valid && (atomic_read(&pos->s_count) <= 1))
+   printk("%s sysfs_dirent use after free: %s-%s\n",
+       __func__, parent_sd->s_name, pos->s_name);
sysfs_put(pos);
if (!valid)
pos = NULL;


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-16 Thread Ming Lei

On Sun, Mar 17, 2013 at 2:33 AM, Sasha Levin  wrote:
>
> I don't think it shows what we want it to show thought:
>
> [  327.416905] Pid: 10504, comm: trinity-child98 Tainted: GW
> 3.9.0-rc2-next-20130315-sasha-00046-gecde602-dirty #301
> [  327.418815] Call Trace:
> [  327.419255]  [] release_sysfs_dirent+0x4e/0x120
> [  327.420595]  [] sysfs_dir_pos+0x92/0x130
> [  327.421608]  [] sysfs_readdir+0x11d/0x280
> [  327.422562]  [] ? SyS_ioctl+0xa0/0xa0
> [  327.423441]  [] ? SyS_ioctl+0xa0/0xa0
> [  327.424314]  [] vfs_readdir+0x78/0xc0
> [  327.425263]  [] SyS_getdents+0x8c/0x110
> [  327.426173]  [] tracesys+0xe1/0xe6
>

Sasha, looks there is a race when sys_readdir() is run concurrently
on same directory, and the below patch may fix the race, could you test the
attachment patch to see if the use after free can be fixed?


Thanks,
-- 
Ming Lei


sysfs-fix-readdir.patch
Description: Binary data

Re: [PATCH] mm/fremap.c: fix another oops on error path

2013-03-17 Thread Ming Lei

On Sun, Mar 17, 2013 at 12:44 PM, Michel Lespinasse  wrote:
> On Sat, Mar 16, 2013 at 8:23 AM, Ming Lei  wrote:
>> Since find_vma() may return NULL, so don't dereference the
>> returned 'vma' until it is valid.
>
> Agree this was an issue. This is fixed with commit a2362d24764a.

It is fine if the issue has been fixed, but I didn't see the commit a2362d
or other fix on both linus and next tree.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: use after free in sysfs_find_dirent

2013-03-17 Thread Ming Lei

On Sun, Mar 17, 2013 at 10:24 PM, Sasha Levin  wrote:
>
> I still see it going on with the patch applied:

Looks the previous patch still has the race problem, so could you just
apply the attachment patch and cancel all previous patches for the
test? If there is still the problem, please post out the log.

BTW, the attachment patch is only for verifying if the current problem
is caused by 'filp->private_data' race, and not for merge.

Thanks,
-- 
Ming Lei

sysfs-fix-readdir-v1.patch
Description: Binary data

Re: A workaround for request_firmware() stuck in module_init

2012-09-04 Thread Ming Lei

On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai  wrote:
> Hi,
>
> as I've got recently a few bug reports regarding the stuck with
> request_firmware() in module_init of some sound drivers, I started
> looking at the issue.  Strangely, the problem doesn't happen on
> openSUSE 12.2 although it has the same udev version with libkmod as
> Fedora.  So I installed Fedora 17, and indeed I could see a problem
> there.

It should be a bug in udev, as discussed in the below link:

  http://marc.info/?t=13455274512&r=1&w=2

>
> Obviously a solution would be to rewrite the driver code to use
> request_firmware_nowait() instead.  But it'd need a lot of code
> shuffling, and most of such drivers are old stuff I don't want to do a
> serious surgery.
>
> So I tried an easier workaround by using the deferred probing.
> An experimental patch is below.  As you can see, from the driver side,
> it's simple: just add two lines at the head of each probe function.

In fact, the defer probe should only be applied in situations which
driver is built in kernel and request_firmware is called in .probe().

>
> Do you think this kind of hack is OK?  If not, any better (IOW easier)
> solution?

Looks your solution is a bit complicated, and I have wrote a similar
patch to address the problem, but Linus thought that it may hide the
problem of drivers.

http://marc.info/?t=13427879084&r=1&w=2

IMO, driver core needn't to be changed, and the defer probe can be
supported just by changes in request_firmware() and its caller.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A workaround for request_firmware() stuck in module_init

2012-09-04 Thread Ming Lei

On Wed, Sep 5, 2012 at 12:10 AM, Takashi Iwai  wrote:
> At Tue, 4 Sep 2012 23:52:15 +0800,
> Ming Lei wrote:
>>
>> On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai  wrote:
>> > Hi,
>> >
>> > as I've got recently a few bug reports regarding the stuck with
>> > request_firmware() in module_init of some sound drivers, I started
>> > looking at the issue.  Strangely, the problem doesn't happen on
>> > openSUSE 12.2 although it has the same udev version with libkmod as
>> > Fedora.  So I installed Fedora 17, and indeed I could see a problem
>> > there.
>>
>> It should be a bug in udev, as discussed in the below link:
>>
>>   http://marc.info/?t=13455274512&r=1&w=2
>
> Yeah, if udev has a fix, I'm fine.  I'll proactively ignore these bug
> reports.  But did it really happen...?

Linus has expressed that it should be fixed by udev, maybe we can
wait some time to see if it will happen, :-)

There are more than 300 request_firmware called in probe(), even
adding 2 line code in these drivers will involve much workload, since
you need to find the matched probe()  for one request_firmware and
sometimes it is not easy.

>
>> > Obviously a solution would be to rewrite the driver code to use
>> > request_firmware_nowait() instead.  But it'd need a lot of code
>> > shuffling, and most of such drivers are old stuff I don't want to do a
>> > serious surgery.
>> >
>> > So I tried an easier workaround by using the deferred probing.
>> > An experimental patch is below.  As you can see, from the driver side,
>> > it's simple: just add two lines at the head of each probe function.
>>
>> In fact, the defer probe should only be applied in situations which
>> driver is built in kernel and request_firmware is called in .probe().
>
> Is it?  I thought the deferred probe is basically not for this problem
> but rather for the dependency problem with other modules.  That's the
> reason it's triggered only upon the successful binding.

Sorry, could you explain the dependency in a bit detail?

>From your patch, I understand you just convert the caller of
request_firmware from module_init into deferred_probe_work_func(),
so the udev problem is avoided, isn't it?

Looks making all .probe() run non-module_init context is still a solution, :-)

>
> And IMO, the deferred probe for the built-in kernel is also silly.
> Did anyone really make it working for built-in kernel driver and
> external firmware files?  (For the resume, it's of course a different

Yes, my original patch does work for the built-in situations.

> issue.  And I guess it's been solved by your fw cache patch, right?)
>
>> > Do you think this kind of hack is OK?  If not, any better (IOW easier)
>> > solution?
>>
>> Looks your solution is a bit complicated, and I have wrote a similar
>> patch to address the problem, but Linus thought that it may hide the
>> problem of drivers.
>>
>> http://marc.info/?t=13427879084&r=1&w=2
>>
>> IMO, driver core needn't to be changed, and the defer probe can be
>> supported just by changes in request_firmware() and its caller.
>
> Ideally, yes.  But it won't work in some drivers like sound drivers,
> that have its own device number counting changed at each probe call.
> For such drivers, the deferred probe check must be done before
> entering the normal probe procedure, and returning -EPROBE_DEFER would
> be too late (or need more complex fallbacks).

Simply, you can put the firmware loading at the start of probe to
avoid the specific
sound problem, :-)


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A workaround for request_firmware() stuck in module_init

2012-09-05 Thread Ming Lei

On Wed, Sep 5, 2012 at 1:53 PM, Takashi Iwai  wrote:
> At Wed, 5 Sep 2012 09:15:34 +0800,
> Ming Lei wrote:
>>
>> On Wed, Sep 5, 2012 at 12:10 AM, Takashi Iwai  wrote:
>> > At Tue, 4 Sep 2012 23:52:15 +0800,
>> > Ming Lei wrote:
>> >>
>> >> On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai  wrote:
>> >> > Hi,
>> >> >
>> >> > as I've got recently a few bug reports regarding the stuck with
>> >> > request_firmware() in module_init of some sound drivers, I started
>> >> > looking at the issue.  Strangely, the problem doesn't happen on
>> >> > openSUSE 12.2 although it has the same udev version with libkmod as
>> >> > Fedora.  So I installed Fedora 17, and indeed I could see a problem
>> >> > there.
>> >>
>> >> It should be a bug in udev, as discussed in the below link:
>> >>
>> >>   http://marc.info/?t=13455274512&r=1&w=2
>> >
>> > Yeah, if udev has a fix, I'm fine.  I'll proactively ignore these bug
>> > reports.  But did it really happen...?
>>
>> Linus has expressed that it should be fixed by udev, maybe we can
>> wait some time to see if it will happen, :-)
>
> Kay, what is the status?
>
>> There are more than 300 request_firmware called in probe(), even
>> adding 2 line code in these drivers will involve much workload, since
>> you need to find the matched probe()  for one request_firmware and
>> sometimes it is not easy.
>
> Well, this depends on from which perspective you look at the issue.
> See below.
>
>> >> > Obviously a solution would be to rewrite the driver code to use
>> >> > request_firmware_nowait() instead.  But it'd need a lot of code
>> >> > shuffling, and most of such drivers are old stuff I don't want to do a
>> >> > serious surgery.
>> >> >
>> >> > So I tried an easier workaround by using the deferred probing.
>> >> > An experimental patch is below.  As you can see, from the driver side,
>> >> > it's simple: just add two lines at the head of each probe function.
>> >>
>> >> In fact, the defer probe should only be applied in situations which
>> >> driver is built in kernel and request_firmware is called in .probe().
>> >
>> > Is it?  I thought the deferred probe is basically not for this problem
>> > but rather for the dependency problem with other modules.  That's the
>> > reason it's triggered only upon the successful binding.
>>
>> Sorry, could you explain the dependency in a bit detail?
>
> When a device has some implicit dependency on another hardware
> component (e.g. SDHCI depends on I2C GPIO controller, as in comment in
> drivers/base/dd.c), the driver needs to wait until another one gets
> ready.  -EPROBE_DEFER was introduced for such a purpose.
>
> So, using this mechanism for the firmware issue is a kind of abuse.
>
>> >From your patch, I understand you just convert the caller of
>> request_firmware from module_init into deferred_probe_work_func(),
>> so the udev problem is avoided, isn't it?
>
> Yes, that was a hack.  The idea is just offloading the probe, and the
> deferred probe is an existing simple way for that.
>
>> Looks making all .probe() run non-module_init context is still a solution, 
>> :-)
>
> Sounds interesting.
>
>> > And IMO, the deferred probe for the built-in kernel is also silly.
>> > Did anyone really make it working for built-in kernel driver and
>> > external firmware files?  (For the resume, it's of course a different
>>
>> Yes, my original patch does work for the built-in situations.
>>
>> > issue.  And I guess it's been solved by your fw cache patch, right?)
>> >
>> >> > Do you think this kind of hack is OK?  If not, any better (IOW easier)
>> >> > solution?
>> >>
>> >> Looks your solution is a bit complicated, and I have wrote a similar
>> >> patch to address the problem, but Linus thought that it may hide the
>> >> problem of drivers.
>> >>
>> >> http://marc.info/?t=13427879084&r=1&w=2
>> >>
>> >> IMO, driver core needn't to be changed, and the defer probe can be
>> >> supported just by changes in request_firmware() and its caller.
>> >
>> > Ideally, yes.  But it won't work in some drivers like sound drivers,
>> > that have its

Question on irq autoprobe

2012-09-13 Thread Ming Lei

Hi,

I see the below comments on probe_irq_off:

 *  BUGS: When used in a module (which arguably shouldn't happen)
 *  nothing prevents two IRQ probe callers from overlapping. The
 *  results of this are non-optimal.
 */

But from the code of probe_irq_on and probe_irq_off, the mutex of
probing_active is held during the whole irq probe procedure, so
I don't understand why the above said that 'nothing prevents two
IRQ probe callers from overlapping", and why isn't the mutex of
probing_active enough to avoid the overlapping?


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] firmware loader: cancel uncache work before caching firmware

2012-10-02 Thread Ming Lei

Under 'Opportunistic sleep' situation, system sleep might be
triggered very frequently, so the uncahce work may not be completed
before caching firmware during next suspend.

This patch cancels the uncache work before caching firmware to
fix the problem above.

Also this patch optimizes the cacheing firmware mechanism a bit by
only storing one firmware cache entry for one firmware image.

So if the firmware is still cached during suspend, it doesn't need
to be loaded from user space any more.

Signed-off-by: Ming Lei 
---
 drivers/base/firmware_class.c |   30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 6e21080..ddc17bc 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1068,17 +1068,27 @@ exit:
return fce;
 }
 
-static int fw_cache_piggyback_on_request(const char *name)
+static int __fw_entry_found(const char *name)
 {
struct firmware_cache *fwc = &fw_cache;
struct fw_cache_entry *fce;
-   int ret = 0;
 
-   spin_lock(&fwc->name_lock);
list_for_each_entry(fce, &fwc->fw_names, list) {
if (!strcmp(fce->name, name))
-   goto found;
+   return 1;
}
+   return 0;
+}
+
+static int fw_cache_piggyback_on_request(const char *name)
+{
+   struct firmware_cache *fwc = &fw_cache;
+   struct fw_cache_entry *fce;
+   int ret = 0;
+
+   spin_lock(&fwc->name_lock);
+   if (__fw_entry_found(name))
+   goto found;
 
fce = alloc_fw_cache_entry(name);
if (fce) {
@@ -1155,8 +1165,13 @@ static void dev_cache_fw_image(struct device *dev, void 
*data)
list_del(&fce->list);
 
spin_lock(&fwc->name_lock);
-   fwc->cnt++;
-   list_add(&fce->list, &fwc->fw_names);
+   /* only one cache entry for one firmware */
+   if (!__fw_entry_found(fce->name)) {
+   fwc->cnt++;
+   list_add(&fce->list, &fwc->fw_names);
+   } else {
+   free_fw_cache_entry(fce);
+   }
spin_unlock(&fwc->name_lock);
 
async_schedule(__async_dev_cache_fw_image, (void *)fce);
@@ -1201,6 +1216,9 @@ static void device_cache_fw_images(void)
 
pr_debug("%s\n", __func__);
 
+   /* cancel uncache work */
+   cancel_delayed_work_sync(&fwc->work);
+
/*
 * use small loading timeout for caching devices' firmware
 * because all these firmware images have been loaded
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firmware loader: cancel uncache work before caching firmware

2012-10-02 Thread Ming Lei

On Tue, Oct 2, 2012 at 4:35 PM, Ming Lei  wrote:
> ---
>  drivers/base/firmware_class.c |   30 --
>  1 file changed, 24 insertions(+), 6 deletions(-)

Please ignore the buggy patch, sorry for the noise.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices

2012-10-23 Thread Ming Lei

On Tue, Oct 23, 2012 at 3:18 AM, Alan Stern  wrote:
> On Mon, 22 Oct 2012, Ming Lei wrote:

> Is this really needed?  Even with iSCSI, doesn't register_disk() have
> to be called for the underlying block device?  And given your 3/6
> patch, wouldn't that mark the network device?

The problem is that network device is not one ancestor of the
iSCSI disk device, which transfers data over tcp stack.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 6/6] USB: forbid memory allocation with I/O during bus reset

2012-10-23 Thread Ming Lei

On Mon, Oct 22, 2012 at 10:37 PM, Alan Stern  wrote:
> On Mon, 22 Oct 2012, Ming Lei wrote:

>>
>> + /*
>> +  * Don't allocate memory with GFP_KERNEL in current
>> +  * context to avoid possible deadlock if usb mass
>> +  * storage interface or usbnet interface(iSCSI case)
>> +  * is included in current configuration. The easiest
>> +  * approach is to do it for all devices.
>> +  */
>> + memalloc_noio_save(noio_flag);
>
> Why not check dev->power.memalloc_noio_resume here too?

Yes, we can use the flag here too even though it is introduced
for rutime_resume case.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-10-23 Thread Ming Lei

On Mon, Oct 22, 2012 at 10:33 PM, Alan Stern  wrote:
>
> Tail recursion should be implemented as a loop, not as an explicit
> recursion.  That is, the function should be:
>
> void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> {
> do {
> dev->power.memalloc_noio_resume = enable;
>
> if (!enable) {
> /*
>  * Don't clear the parent's flag if any of the
>  * parent's children have their flag set.
>  */
> if (device_for_each_child(dev->parent, NULL,
>   dev_memalloc_noio))
> return;
> }
> dev = dev->parent;
> } while (dev);
> }

OK, will take the non-recursion implementation for saving kernel
stack space.

>
> except that you need to add locking, for two reasons:
>
> There's a race.  What happens if another child sets the flag
> between the time device_for_each_child() runs and the next loop
> iteration?

Yes, I know the race, and not adding a lock because the function
is mostly called in .probe() or .remove() callback and its parent's device
lock is held to avoid this race.

Considered that it may be called in async probe() (scsi disk), one lock
is needed, the simplest way is to add a global lock. Any suggestion?

>
> Even without a race, access to bitfields is not SMP-safe
> without locking.

You mean one ancestor device might not be in active when
one of its descendants is being probed or removed?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list

2012-10-23 Thread Ming Lei

On Tue, Oct 23, 2012 at 8:52 PM, Dimitris Papastamos
 wrote:
> In preparation to support dynamic listing/updating of firmware
> paths via procfs, this patch converts the firmware path configuration
> from an array to a list.

I remembered that I have questioned on dynamic listing/updating of
firmware via procfs, looks you don't reply, :-(

  http://marc.info/?l=linux-kernel&m=134988208617179&w=2

IMO, it is not a good idea to update firmware path dynamically via proc
because it may be too late to update a dynamic search path via /proc
and drivers need request firmware just after rootfs is mounted.

kernel parameter should be a good way to pass one customerized
path.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list

2012-10-23 Thread Ming Lei

On Tue, Oct 23, 2012 at 9:29 PM, Dimitris Papastamos
 wrote:

>
> Hi sorry for being unclear, I mentioned in the old thread that
> I've not had time to fix this properly (or think about a proper
> fix), Greg requested a rebase on top of linux-next, however, at
> least for firmware path listing.
>
> Should I amend the commit description so it is not misleading and
> remove the 'updating' part?

In fact, I also questioned on path listing too in last thread too.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] printk: fix broken 'console' kernel parameter

2012-10-23 Thread Ming Lei

If CONFIG_A11Y_BRAILLE_CONSOLE is not enabled, _braille_console_setup()
should return NULL to parse the parameter further in console_setup().

This patch fixes the broken 'console' kernel parameter, which makes the
Pandaboard not boot with 'console=ttyO2'.

Cc: Andrew Morton 
Cc: Samuel Thibault 
Cc: Joe Perches 

Signed-off-by: Ming Lei 
---
 kernel/printk/braille.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/braille.h b/kernel/printk/braille.h
index d2e6bc3..769d771 100644
--- a/kernel/printk/braille.h
+++ b/kernel/printk/braille.h
@@ -28,7 +28,7 @@ braille_set_options(struct console_cmdline *c, char 
*brl_options)
 static inline char *
 _braille_console_setup(char **str, char **brl_options)
 {
-   return *str;
+   return NULL;
 }
 
 static inline int
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list

2012-10-23 Thread Ming Lei

On Tue, Oct 23, 2012 at 9:44 PM, Dimitris Papastamos
 wrote:
>
> I don't currently have a use-case for this, so not sure how useful
> it is to list the default fw paths.

OK, got it, thanks for your effort.

Greg, so could we just hold on until it is useful?

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-10-23 Thread Ming Lei

On Tue, Oct 23, 2012 at 10:46 PM, Alan Stern  wrote:
> On Tue, 23 Oct 2012, Ming Lei wrote:
>
>> On Mon, Oct 22, 2012 at 10:33 PM, Alan Stern  
>> wrote:
>> >
>> > Tail recursion should be implemented as a loop, not as an explicit
>> > recursion.  That is, the function should be:
>> >
>> > void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
>> > {
>> > do {
>> > dev->power.memalloc_noio_resume = enable;
>> >
>> > if (!enable) {
>> > /*
>> >  * Don't clear the parent's flag if any of the
>> >  * parent's children have their flag set.
>> >  */
>> > if (device_for_each_child(dev->parent, NULL,
>> >   dev_memalloc_noio))
>> > return;
>> > }
>> > dev = dev->parent;
>> > } while (dev);
>> > }
>>
>> OK, will take the non-recursion implementation for saving kernel
>> stack space.
>>
>> >
>> > except that you need to add locking, for two reasons:
>> >
>> > There's a race.  What happens if another child sets the flag
>> > between the time device_for_each_child() runs and the next loop
>> > iteration?
>>
>> Yes, I know the race, and not adding a lock because the function
>> is mostly called in .probe() or .remove() callback and its parent's device
>> lock is held to avoid this race.
>>
>> Considered that it may be called in async probe() (scsi disk), one lock
>> is needed, the simplest way is to add a global lock. Any suggestion?
>
> No.  Because of where you put the new flag, it must be protected by
> dev->power.lock.  And this means the iterative implementation shown
> above can't be used as is.  It will have to be more like this:
>
> void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> {
> spin_lock_irq(&dev->power.lock);
> dev->power.memalloc_noio_resume = enable;
>
> while (dev->parent) {
> spin_unlock_irq(&dev->power.lock);
> dev = dev->parent;
>
> spin_lock_irq(&dev->power.lock);
> /*
>  * Don't clear the parent's flag if any of the
>  * parent's children have their flag set.
>  */
> if (!enable && device_for_each_child(dev->parent, NULL,

s/dev->parent/dev

> dev_memalloc_noio))
> break;
> dev->power.memalloc_noio_resume = enable;
> }
> spin_unlock_irq(&dev->power.lock);
> }

With the problem of non-SMP-safe bitfields access, the power.lock should
be held, but that is not enough to prevent children from being probed or
disconnected. Looks another lock is still needed. I think a global lock
is OK in the infrequent path.

>
>> > Even without a race, access to bitfields is not SMP-safe
>> > without locking.
>>
>> You mean one ancestor device might not be in active when
>> one of its descendants is being probed or removed?
>
> No.  Consider this example:
>
> struct foo {
> int a:1;
> int b:1;
> } x;
>
> Consider what happens if CPU 0 does "x.a = 1" at the same time as
> another CPU 1 does "x.b = 1".  The compiler might produce object code
> looking like this for CPU 0:
>
>     movex, reg1
> or  0x1, reg1
> movereg1, x
>
> and this for CPU 1:
>
> movex, reg2
> or  0x2, reg2
> movereg2, x
>
> With no locking, the two "or" instructions could execute
> simultaneously.  What will the final value of x be?
>
> The two CPUs will interfere, even though they are touching different
> bitfields.

Got it, thanks for your detailed explanation.

Looks the problem is worse than above, not only bitfields are affected, the
adjacent fields might be involved too, see:

   http://lwn.net/Articles/478657/


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] firmware loader: document kernel direct loading

2012-10-23 Thread Ming Lei

This patch adds description on recently introduced direct firmware
loading by Linus.

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
 Documentation/firmware_class/README |   26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 7eceaff..815b711 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -18,32 +18,40 @@
  High level behavior (mixed):
  
 
- kernel(driver): calls request_firmware(&fw_entry, $FIRMWARE, device)
-
- userspace:
+ 1), kernel(driver):
+   - calls request_firmware(&fw_entry, $FIRMWARE, device)
+   - kernel searchs the fimware image with name $FIRMWARE directly
+   in the below search path of root filesystem:
+   "/lib/firmware/updates/" UTS_RELEASE,
+   "/lib/firmware/updates",
+   "/lib/firmware/" UTS_RELEASE,
+   "/lib/firmware"
+   - If found, goto 7), else goto 2)
+
+ 2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
  and the usual hotplug environment.
- hotplug: echo 1 > /sys/class/firmware/xxx/loading
 
- kernel: Discard any previous partial load.
+ 3), kernel: Discard any previous partial load.
 
- userspace:
+ 4), userspace:
- hotplug: cat appropriate_firmware_image > \
/sys/class/firmware/xxx/data
 
- kernel: grows a buffer in PAGE_SIZE increments to hold the image as it
+ 5), kernel: grows a buffer in PAGE_SIZE increments to hold the image as it
 comes in.
 
- userspace:
+ 6), userspace:
- hotplug: echo 0 > /sys/class/firmware/xxx/loading
 
- kernel: request_firmware() returns and the driver has the firmware
+ 7), kernel: request_firmware() returns and the driver has the firmware
 image in fw_entry->{data,size}. If something went wrong
 request_firmware() returns non-zero and fw_entry is set to
 NULL.
 
- kernel(driver): Driver code calls release_firmware(fw_entry) releasing
+ 8), kernel(driver): Driver code calls release_firmware(fw_entry) releasing
 the firmware image and any related resource.
 
  High level behavior (driver code):
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path

2012-10-23 Thread Ming Lei

This patch introduces one kernel parameter of 'fw_path' to support
customizing firmware image search path, so that people can use its
own firmware path if the default built-in paths can't meet their demand[1].

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
 Documentation/firmware_class/README |1 +
 Documentation/kernel-parameters.txt |9 +
 drivers/base/firmware_class.c   |   24 ++--
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..af63ad3 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,6 +22,7 @@
- calls request_firmware(&fw_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by kernel parameter of 'fw_path'
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 9776f06..d1125c5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -52,6 +52,7 @@ parameter is applicable:
EVM Extended Verification Module
FB  The frame buffer device is enabled.
FTRACE  Function tracing enabled.
+   FW  The firmware loader is enabled.
GCOVGCOV profiling is enabled.
HW  Appropriate hardware is enabled.
IA-64   IA-64 architecture is enabled.
@@ -843,6 +844,14 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
Format: ,,,
See also Documentation/fault-injection/.
 
+   fw_path=[FW]
+   Customized firmware image search path for kernel
+   direct loading, which has higher priority than
+   built-in default search path.
+   Format: , length < 256
+   direcory path, such as /lib/firmware.
+   See Documentation/firmware_class/README.
+
floppy= [HW]
See Documentation/blockdev/floppy.txt.
 
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..e7db931 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -274,6 +274,17 @@ static const char *fw_path[] = {
"/lib/firmware"
 };
 
+static char fw_path_para[256];
+static int __init customized_path(char *str)
+{
+   if (strlen(str) < 256)
+   strlcpy(fw_path_para, str, sizeof(fw_path_para));
+   return 1;
+}
+__setup("fw_path=", customized_path);
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -313,9 +324,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
bool success = false;
char *path = __getname();
 
-   for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
+   for (i = -1; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
-   snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
+
+   if (i < 0) {
+   if (!fw_path_para[0])   /* No customized path */
+   continue;
+   snprintf(path, PATH_MAX, "%s/%s", fw_path_para,
+buf->fw_id);
+   } else {
+   snprintf(path, PATH_MAX, "%s/%s", fw_path[i],
+buf->fw_id);
+   }
 
file = filp_open(path, O_RDONLY, 0);
if (IS_ERR(file))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path

2012-10-23 Thread Ming Lei

On Wed, Oct 24, 2012 at 11:18 AM, Greg Kroah-Hartman
 wrote:
>
> Ick, no, not another kernel boot parameter.  I would prefer the /proc/
> file solution instead, just like /proc/sys/kernel/hotplug is.

OK, got it, but we could use the module parameter(firmware_class.path
via kernel command) to customize the search path too, couldn't we?

>
> But, that really wouldn't work for boot time, so, again just like
> hotplug works, make it also a Kconfig option for those distros that
> build their own kernels and keep their firmware in odd places, or don't
> have a /lib/ directory, can override it easily.

Sound like still a solution, :-)

If you agree on module parameter, that might be more flexible, IMO


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path

2012-10-23 Thread Ming Lei

On Wed, Oct 24, 2012 at 1:42 PM, Greg Kroah-Hartman
 wrote:

> Yes, but why?  Let's not overdesign this, get the basics working and
> then, if someone really needs it, we can add it.

OK, I will remove the kernel parameter and related documentation,
and just keep it as module parameter, sorry about that.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Q] "console=" command line ignored?

2012-10-24 Thread Ming Lei

On Wed, Oct 24, 2012 at 6:49 PM, Guennadi Liakhovetski
 wrote:
> Hi
>
> I cannot seem to force selection of a serial console by specifying
> "console=ttySC0..." on the command line with modern (yesterday's "next")
> kernels. The kernel insists on
>
> console [tty0] enabled

Please see https://lkml.org/lkml/2012/10/23/307

>
> The reason might also well be, that the respective "sh-sci" driver is
> only registered later. I'll try to look at this in more detail - since
> when this is broken and how the registration order looked before. This is
> just a first heads-up in case someone has a solution immediately at hand.



Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Ming Lei

This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
V1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg

 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   23 +--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..ce02744 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(&fw_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware.path=$CUSTOMIZED_PATH' if
+   firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..b363103 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -274,6 +274,16 @@ static const char *fw_path[] = {
"/lib/firmware"
 };
 
+static char fw_path_para[256];
+
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -313,9 +323,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
bool success = false;
char *path = __getname();
 
-   for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
+   for (i = -1; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
-   snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
+
+   if (i < 0) {
+   if (!fw_path_para[0])   /* No customized path */
+   continue;
+   snprintf(path, PATH_MAX, "%s/%s", fw_path_para,
+buf->fw_id);
+   } else {
+   snprintf(path, PATH_MAX, "%s/%s", fw_path[i],
+buf->fw_id);
+   }
 
file = filp_open(path, O_RDONLY, 0);
if (IS_ERR(file))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Ming Lei

On Fri, Oct 26, 2012 at 10:32 AM, Linus Torvalds
 wrote:
>
> Please just make "fw_path[0]" just be the pointer to fw_path_para[]
> (which sounds like the cleanest fix) and get rid of the negative 'i'
> and conditional entirely.

Yes, it should be the cleanest, I don't do it because I thought that might
have caused one compile warning('const char *' points to memory
without 'const', like below)

static char fw_path_para[256];
static const char *fw_path[] = {
fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
};

but in fact there isn't any warning with above change and it does work, still
don't know why? :-(

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()

2012-10-31 Thread Ming Lei

On Wed, Oct 31, 2012 at 11:41 PM, Alan Stern  wrote:
>
> Sorry, I misread your message.  You are setting the device's flag, not
> the thread's flag.

Never mind.

>
> This still doesn't help in this case where CONFIG_PM_RUNTIME is
> disabled.  I think it will be simpler to set the noio flag during every
> device reset.

Yes, it's better to set the flag during every device reset now.

Also pppoe or network interface over serial port is a bit difficult to
deal with, as Oliver pointed out.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 0/6] solve deadlock caused by memory allocation with I/O

2012-11-03 Thread Ming Lei

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime PM and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix the deadlock
problem on block device/network device during runtime PM or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IOFS during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

Change logs:
V4:
- patches from the 2nd to the 6th changed   
- call pm_runtime_set_memalloc_noio() after device_add() as pointed
by Alan
- set PF_MEMALLOC_NOIO during runtime_suspend()

V3:
- patch 2/6 and 5/6 changed, see their commit log
- remove RFC from title since several guys have expressed that
it is a reasonable solution
V2:
- remove changes on 'may_writepage' and 'may_swap'(1/6)
- unset GFP_IOFS in try_to_free_pages() path(1/6)
- introduce pm_runtime_set_memalloc_noio()
- only apply the meachnism on block/network device and its ancestors
for runtime resume context
V1:
- take Minchan's change to avoid the check in alloc_page hot path
- change the helpers' style into save/restore as suggested by Alan
- memory allocation with no io in usb bus reset path for all devices
as suggested by Greg and Oliver

 block/genhd.c|9 +
 drivers/base/power/runtime.c |   89 +-
 drivers/usb/core/hub.c   |   13 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 ++
 include/linux/sched.h|   10 +
 mm/page_alloc.c  |   10 -
 mm/vmscan.c  |   12 ++
 net/core/net-sysfs.c |5 +++


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/6] mm: teach mm by current context info to not do I/O during memory allocation

2012-11-03 Thread Ming Lei

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device, network devices involved too
in the iSCSI case)

- during block device runtime suspend, because runtime resume need
to wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the
affected path into GFP_NOIO because these functions doing that may be
implemented as library and will be called in many other contexts.

In fact, memalloc_noio() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL back in other non-affected contexts, at least
almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_IO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
<-acpi_ut_allocate
<-ACPI_ALLOCATE_ZEROED
<-acpi_evaluate_object
<-__acpi_bus_set_power
<-acpi_bus_set_power
<-acpi_pci_set_power_state

<-platform_pci_set_power_state

<-pci_platform_power_transition

<-__pci_complete_power_transition

<-pci_set_power_state

<-pci_restore_standard_config

<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
<-finish_port_resume
<-usb_port_resume
<-generic_resume
<-usb_resume_device
<-usb_resume_both
<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, 

That is just what I have found.  Unfortunately, this allocation can
only be found by human being now, and there should be many not found
since any function in the resume path(call tree) may allocate memory
with GFP_KERNEL.

Cc: Alan Stern 
Cc: Oliver Neukum 
Cc: Jiri Kosina 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: KAMEZAWA Hiroyuki 
Cc: Michal Hocko 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Minchan Kim 
Signed-off-by: Ming Lei 
---
v4:
- fix comment
v3:
- no change
v2:
- remove changes on 'may_writepage' and 'may_swap' because that
  isn't related with the patchset, and can't introduce I/O in
  allocation path if GFP_IOFS is unset, so handing 'may_swap'
  and may_writepage on GFP_NOIO or GFP_NOFS  should be a
  mm internal thing, and let mm guys deal with that, :-).

  Looks clearing the two may_XXX flag only excludes dirty

[PATCH v4 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-03 Thread Ming Lei

The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Ming Lei 
---
v4:
- rename memalloc_noio_resume as memalloc_noio
- remove pm_runtime_get_memalloc_noio()
- add comments on pm_runtime_set_memalloc_noio
v3:
- introduce pm_runtime_get_memalloc_noio()
- hold one global lock on pm_runtime_set_memalloc_noio
- hold device power lock when accessing memalloc_noio_resume
  flag suggested by Alan Stern
- implement pm_runtime_set_memalloc_noio without recursion
  suggested by Alan Stern
v2:
- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   57 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 +++
 3 files changed, 61 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..d477924 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,63 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+   return dev->power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ * if memory allocation with GFP_KERNEL is called inside runtime
+ * resume/suspend callback of any one of its ancestors(or the
+ * block device itself), the deadlock may be triggered inside the
+ * memory allocation since it might not complete until the block
+ * device becomes active and the involed page I/O finishes. The
+ * situation is pointed out first by Alan Stern. Network device
+ * are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+   static DEFINE_MUTEX(dev_hotplug_mutex);
+
+   mutex_lock(&dev_hotplug_mutex);
+   for(;;) {
+   /* hold power lock since bitfield is not SMP-safe. */
+   spin_lock_irq(&dev->power.lock);
+   dev->power.memalloc_noio = enable;
+   spin_unlock_irq(&dev->power.lock);
+
+   dev = dev->parent;
+
+   /* only clear the flag for one device if all
+* children of the device don't set the flag.
+*/
+   if (!dev || (!enable &&
+device_for_each_child(dev, NULL,
+  dev_memalloc_noio)))
+   break;
+   }
+   mutex_unlock(&dev_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
unsigned intirq_safe:1;
unsigned intuse_autosuspend:1;
unsigned inttimer_autosuspends:1;
+   unsigned intmemalloc_noio:1;
enum rpm_requestrequest;
enum rpm_status runtime_status;
int runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device 
*dev, int delay);
 extern unsigned long pm_runtime_autosuspend_expiration(struct device *

[PATCH v4 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices

2012-11-03 Thread Ming Lei

This patch applyes the introduced pm_runtime_set_memalloc_noio on
block device so that PM core will teach mm to not allocate memory with
GFP_IOFS when calling the runtime_resume and runtime_suspend callback
for block devices and its ancestors.

Cc: Jens Axboe 
Signed-off-by: Ming Lei 
---
v4:
- call pm_runtime_set_memalloc_noio(ddev, true) after device_add
---
 block/genhd.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 9e02cd6..f3fe3aa 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -532,6 +533,13 @@ static void register_disk(struct gendisk *disk)
return;
}
}
+
+   /* avoid probable deadlock caused by allocating memory with
+* GFP_KERNEL in runtime_resume callback of its all ancestor
+* deivces
+*/
+   pm_runtime_set_memalloc_noio(ddev, true);
+
disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
 
@@ -661,6 +669,7 @@ void del_gendisk(struct gendisk *disk)
disk->driverfs_dev = NULL;
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+   pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
device_del(disk_to_dev(disk));
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices

2012-11-03 Thread Ming Lei

Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume and runtime_suspend callback of network devices in
iSCSI situation, so mark network devices and its ancestor as
'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio().

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: David Decotigny 
Cc: Tom Herbert 
Cc: Ingo Molnar 
Signed-off-by: Ming Lei 
---
v4:
 - call pm_runtime_set_memalloc_noio(ddev, true) after
   device_add
---
 net/core/net-sysfs.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bcf02f6..a55d255 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "net-sysfs.h"
@@ -1386,6 +1387,8 @@ void netdev_unregister_kobject(struct net_device * net)
 
remove_queue_kobjects(net);
 
+   pm_runtime_set_memalloc_noio(dev, false);
+
device_del(dev);
 }
 
@@ -1421,6 +1424,8 @@ int netdev_register_kobject(struct net_device *net)
return error;
}
 
+   pm_runtime_set_memalloc_noio(dev, true);
+
return error;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack

2012-11-03 Thread Ming Lei

This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume/runtime_suspend callback on device with
the flag of 'memalloc_noio' set.

Cc: Alan Stern 
Cc: Oliver Neukum 
Cc: Rafael J. Wysocki 
Signed-off-by: Ming Lei 
---
v4:
- runtime_suspend need this too because rpm_resume may wait for
completion of concurrent runtime_suspend, so deadlock still may
be triggered in runtime_suspend path.
---
 drivers/base/power/runtime.c |   32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index d477924..7ed17a9 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -368,6 +368,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
int (*callback)(struct device *);
struct device *parent = NULL;
int retval;
+   unsigned int noio_flag;
 
trace_rpm_suspend(dev, rpmflags);
 
@@ -477,7 +478,20 @@ static int rpm_suspend(struct device *dev, int rpmflags)
if (!callback && dev->driver && dev->driver->pm)
callback = dev->driver->pm->runtime_suspend;
 
-   retval = rpm_callback(callback, dev);
+   /*
+* Deadlock might be caused if memory allocation with GFP_KERNEL
+* happens inside runtime_suspend callback of one block device's
+* ancestor or the block device itself. Network device might be
+* thought as part of iSCSI block device, so network device and
+* its ancestor should be marked as memalloc_noio.
+*/
+   if (dev->power.memalloc_noio) {
+   memalloc_noio_save(noio_flag);
+   retval = rpm_callback(callback, dev);
+   memalloc_noio_restore(noio_flag);
+   } else {
+   retval = rpm_callback(callback, dev);
+   }
if (retval)
goto fail;
 
@@ -560,6 +574,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
int (*callback)(struct device *);
struct device *parent = NULL;
int retval = 0;
+   unsigned int noio_flag;
 
trace_rpm_resume(dev, rpmflags);
 
@@ -709,7 +724,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
if (!callback && dev->driver && dev->driver->pm)
callback = dev->driver->pm->runtime_resume;
 
-   retval = rpm_callback(callback, dev);
+   /*
+* Deadlock might be caused if memory allocation with GFP_KERNEL
+* happens inside runtime_resume callback of one block device's
+* ancestor or the block device itself. Network device might be
+* thought as part of iSCSI block device, so network device and
+* its ancestor should be marked as memalloc_noio.
+*/
+   if (dev->power.memalloc_noio) {
+   memalloc_noio_save(noio_flag);
+   retval = rpm_callback(callback, dev);
+   memalloc_noio_restore(noio_flag);
+   } else {
+   retval = rpm_callback(callback, dev);
+   }
if (retval) {
__update_runtime_status(dev, RPM_SUSPENDED);
pm_runtime_cancel_pending(dev);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 6/6] USB: forbid memory allocation with I/O during bus reset

2012-11-03 Thread Ming Lei

If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).

Cc: Alan Stern 
Cc: Oliver Neukum 
Signed-off-by: Ming Lei 
---
v4:
- mark current memalloc_noio for every usb device reset
---
 drivers/usb/core/hub.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5b131b6..788e652 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5044,6 +5044,7 @@ int usb_reset_device(struct usb_device *udev)
 {
int ret;
int i;
+   unsigned int noio_flag;
struct usb_host_config *config = udev->actconfig;
 
if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5053,6 +5054,17 @@ int usb_reset_device(struct usb_device *udev)
return -EINVAL;
}
 
+   /*
+* Don't allocate memory with GFP_KERNEL in current
+* context to avoid possible deadlock if usb mass
+* storage interface or usbnet interface(iSCSI case)
+* is included in current configuration. The easist
+* approach is to do it for every device reset,
+* because the device 'memalloc_noio' flag may have
+* not been set before reseting the usb device.
+*/
+   memalloc_noio_save(noio_flag);
+
/* Prevent autosuspend during the reset */
usb_autoresume_device(udev);
 
@@ -5097,6 +5109,7 @@ int usb_reset_device(struct usb_device *udev)
}
 
usb_autosuspend_device(udev);
+   memalloc_noio_restore(noio_flag);
return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] firmware loader: introduce module parameter to customize(v4) fw search path

2012-11-03 Thread Ming Lei

This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
v4:
- fix one comment and rebase on the latest next tree
v3:
- fix one mistake on checking unset firmware path
v2:
- take a cleaner approach suggested by Linus
- mark the path array as const because it needn't be changed
- fix one error in Document about the module name
v1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg
---
 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   17 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..e9fce78 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(&fw_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH'
+   if firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 620b876..2067514 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf)
 }
 
 /* direct firmware loading support */
-static const char *fw_path[] = {
+static char fw_path_para[256];
+static const char * const fw_path[] = {
+   fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
 };
 
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command line because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline_for_stack long fw_file_size(struct file *file)
 {
@@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
 
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
+
+   /* skip the unset customized path */
+   if (!fw_path[i][0])
+   continue;
+
snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
 
file = filp_open(path, O_RDONLY, 0);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] firmware loader: document firmware cache mechanism

2012-11-03 Thread Ming Lei

From: Ming Lei 

This patch documents the firmware cache mechanism so that
users of request_firmware() know that it can be called
safely inside device's suspend and resume callback, and
the device's firmware needn't be cached any more by individual
driver itself to deal with firmware loss during system resume.

Signed-off-by: Ming Lei 
---
 Documentation/firmware_class/README |7 +++
 drivers/base/firmware_class.c   |3 +++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index e9fce78..43fada9 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -119,3 +119,10 @@
on the setup, so I think that the choice on what firmware to make
persistent should be left to userspace.
 
+ about firmware cache:
+ 
+ After firmware cache mechanism is introduced during system sleep,
+ request_firmware can be called safely inside device's suspend and
+ resume callback, and callers need't cache the firmware by
+ themselves any more for dealing with firmware loss during system
+ resume.
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 2067514..15f2822 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -978,6 +978,9 @@ err_put_dev:
  *  firmware image for this or any other device.
  *
  * Caller must hold the reference count of @device.
+ *
+ * The function can be called safely inside device's suspend and
+ * resume callback.
  **/
 int
 request_firmware(const struct firmware **firmware_p, const char *name,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 09/13] driver core: firmware loader: store firmware name into devres list

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 12:15 AM, Borislav Petkov  wrote:
> On Wed, Jul 25, 2012 at 01:00:09AM +0800, Ming Lei wrote:
>> This patch will store firmware name into devres list of the device
>> which is requesting firmware loading, so that we can implement
>> auto cache firmware for devices in need.
>
> Stupid question: does this mean that once the firmware name is in the
> devres list, it is being cached automatically and device drivers which
> don't want that need to explicitly uncache it?

Both the auto cache and auto uncache actions are not triggered by device
driver, and will be triggered by some system state, for example, in 13/13,
you will find the cache is done before system suspend and the uncache is
done after system resume.

If device drivers want to cache its firmware explicitly, they have to uncache
it explicitly too, see cache_firmware/uncache_firmware in 7/13.

Thanks
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 11/13] driver core: firmware: introduce devices_cache/uncache_firmwares

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 12:52 AM, Borislav Petkov  wrote:
> On Wed, Jul 25, 2012 at 01:00:11AM +0800, Ming Lei wrote:
>> This patches introduces the three helpers below:
>>
>>   void device_cache_firmwares(void)
>>   void device_uncache_firmwares(void)
>>   void device_uncache_firmwares_delay(unsigned long)
>
> I kinda don't like the plural of firmware: "firmwares". Can we call
> those
> device_cache_fw_images
> device_uncache_fw_images

Looks fine.

>
> or
> device_cache_fw_blobs
>
> or whatever?
>
>> so we can call device_cache_firmwares() to cache firmwares for
>> all devices which need firmwares to work, and the device driver
>> can get the firmware easily from kernel memory when system isn't
>> readly for completing their requests of loading firmwares.
>>
>> When system is ready for completing firmware loading, driver core
>> can call device_uncache_firmwares() or its delay version to free
>> the cached firmwares.
>>
>> The above helpers should be used to cache device firmwares during
>> system suspend/resume cycle in the following patches.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  drivers/base/firmware_class.c |  236 
>> +++--
>>  1 file changed, 230 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
>> index c181e6d..7a96e75 100644
>> --- a/drivers/base/firmware_class.c
>> +++ b/drivers/base/firmware_class.c
>> @@ -22,6 +22,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>> +
>> +#include "base.h"
>>
>>  MODULE_AUTHOR("Manuel Estrada Sainz");
>>  MODULE_DESCRIPTION("Multi purpose firmware loading support");
>> @@ -91,6 +95,17 @@ struct firmware_cache {
>>   /* firmware_buf instance will be added into the below list */
>>   spinlock_t lock;
>>   struct list_head head;
>> +
>> + /*
>> +  * Name of firmware which has been cached successfully will be
>> +  * added into the below list so that device uncache helper can
>> +  * trace which firmware has been cached before.
>> +  */
>
> Comment above the list_head and maybe call the list_head fw_names or so?
>
>> + spinlock_t name_lock;
>> + struct list_head name_head;
>> + wait_queue_head_t  wait_queue;
>
> Stray \t
>
>> + int cnt;
>> + struct delayed_work work;
>>  };
>>
>>  struct firmware_buf {
>> @@ -107,6 +122,11 @@ struct firmware_buf {
>>   char fw_id[];
>>  };
>>
>> +struct fw_name_for_cache {
>> + struct list_head list;
>> + char name[];
>> +};
>
> Maybe fw_cache_entry?

Looks fine.

>
>> +
>>  struct firmware_priv {
>>   struct timer_list timeout;
>>   bool nowait;
>> @@ -214,12 +234,6 @@ static void fw_free_buf(struct firmware_buf *buf)
>>   kref_put(&buf->ref, __fw_free_buf);
>>  }
>>
>> -static void fw_cache_init(void)
>> -{
>> - spin_lock_init(&fw_cache.lock);
>> - INIT_LIST_HEAD(&fw_cache.head);
>> -}
>> -
>>  static struct firmware_priv *to_firmware_priv(struct device *dev)
>>  {
>>   return container_of(dev, struct firmware_priv, dev);
>> @@ -981,6 +995,216 @@ int uncache_firmware(const char *fw_name)
>>   return -EINVAL;
>>  }
>>
>> +static struct fw_name_for_cache *alloc_fw_name_cache(const char *name)
>> +{
>> + struct fw_name_for_cache *nc;
>> +
>> + nc = kzalloc(sizeof(nc) + strlen(name) + 1, GFP_KERNEL);
>> + if (!nc)
>> + goto exit;
>> +
>> + strcpy(nc->name, name);
>> +exit:
>> + return nc;
>> +}
>> +
>> +static void free_fw_name_cache(struct fw_name_for_cache *nc)
>> +{
>> + kfree(nc);
>> +}
>> +
>> +static void __async_dev_cache_firmware(void *fw_name,
>> + async_cookie_t cookie)
>
> Arg alignment.

I am wondering why checkpatch.pl doesn't add the check...

>
>> +{
>> + struct fw_name_for_cache *nc;
>> + struct firmware_cache *fwc = &fw_cache;
>> + char *curr_name;
>> + int ret;
>> +
>> + /* 'fw_name' is stored in devres, and it may be released,
>> +  * so allocate buffer to store the firmware name
>> +  */
>> + curr_name = kstrdup((const char *)fw_name, GFP_KERNEL)

Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 8:20 PM, Borislav Petkov  wrote:
>
> Ok, here's what I got from looking at the patch:
>
> Your commit message says: "Also request_firmware_nowait should be called
> in atomic context now, so fix the obsolete comments."
>
> Atomic context in my book means you're not allowed to sleep at all.

In fact, I mean the function can be called in atomic context now, and
I know some time ago the function will create kthread to execute
the request_firmware, and atomic context is not allowed.

So I remove the obsolete comment.

>
> But the comment says that it is possible to sleep a little. This is very
> wrongly formulated AFAICT.

The function can be run in both contexts, and I don't see any words which
says the function will sleep.

>
> But, since request_firmware_nowait receives a GFP mask as one of its
> arguments and some of its callers don't supply GFP_ATOMIC then this
> has nothing to do with atomic contexts at all. Then, you should simply
> explain in the comment why exactly callers aren't allowed to be sleeping
> for a long time. And using adjectives like "long" or "short" is very
> misleading in such explanations so please be more specific as to why the

It is the original one, and I don't think it is wrong. Also it
shouldn't be covered
by this patch.

Maybe I shouldn't have fixed the comment in this patch.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 8:36 PM, Borislav Petkov  wrote:
> On Wed, Jul 25, 2012 at 01:00:12AM +0800, Ming Lei wrote:
>> Because device_cache_firmwares only cache the firmware which has been
>> loaded sucessfully at leat once, using a small loading timeout should
>
> least
>
>> be OK.
>
> Your commit message doesn't explain why exactly we decrease the timeout:

I have explained it. Because the firmware has been loaded successfully at least
once, so it is very probably to not timeout.

> you should probably say that this patch overrides the default 60s
> timeout because we're in pre-suspend/-hibernate mode where we have
> userspace and are able to load the firmware quickly.

No, it is not what I was saying.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 13/13] driver core: firmware loader: cache devices firmware during suspend/resume cycle

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 8:43 PM, Borislav Petkov  wrote:
>> +#else
>> +static int fw_pm_notify(struct notifier_block *notify_block,
>> + unsigned long mode, void *unused)
>> +{}
>
> static inline int fw_pm...

Will add inline in -v1.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 10/13] driver core: devres: introduce devres_for_each_res

2012-07-26 Thread Ming Lei

On Thu, Jul 26, 2012 at 12:25 AM, Borislav Petkov  wrote:
> On Wed, Jul 25, 2012 at 01:00:10AM +0800, Ming Lei wrote:
>> This patch introduces one devres API of devres_for_each_res
>> so that the device's driver can iterate each resource it has
>> interest in.
>>
>> The firmware loader will use the API to get each firmware name
>> from the device instance.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  drivers/base/devres.c  |   42 ++
>>  include/linux/device.h |3 +++
>>  2 files changed, 45 insertions(+)
>>
>> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
>> index 2360adb..8273ba5 100644
>> --- a/drivers/base/devres.c
>> +++ b/drivers/base/devres.c
>> @@ -144,6 +144,48 @@ EXPORT_SYMBOL_GPL(devres_alloc);
>>  #endif
>>
>>  /**
>> + * devres_for_each_res - Resource iterator
>> + * @dev: Device to iterate resource from
>> + * @release: Look for resources associated with this release function
>> + * @match: Match function (optional)
>> + * @match_data: Data for the match function
>> + * @fn: function to be called for each matched resource.
>> + *
>> + * Call @fn for each devres of @dev which is associated with @release
>> + * and for which @match returns 1.
>> + *
>> + * RETURNS:
>> + *   void
>> + */
>> +void devres_for_each_res(struct device *dev, dr_release_t release,
>> + dr_match_t match, void *match_data,
>> + void (*fn)(struct device *, void *))
>> +{
>> + struct devres_node *node;
>> + struct devres_node *tmp;
>> + unsigned long flags;
>> +
>> + if (!fn)
>> + return;
>> +
>> + spin_lock_irqsave(&dev->devres_lock, flags);
>> + list_for_each_entry_safe_reverse(node, tmp,
>> + &dev->devres_head, entry) {
>
> Why break this line?
>
> list_for_each_entry_safe_reverse(node, tmp, &dev->devres_head, entry) 
> {
>
> is perfectly fine.
>
>> + struct devres *dr = container_of(node, struct devres, node);
>> +
>> + if (node->release != release)
>> + continue;
>> + if (match && !match(dev, dr->data, match_data))
>> + continue;
>> + spin_unlock_irqrestore(&dev->devres_lock, flags);
>> + fn(dev, dr->data);
>> + spin_lock_irqsave(&dev->devres_lock, flags);
>> + }
>> + spin_unlock_irqrestore(&dev->devres_lock, flags);
>
> This looks strange. For the last node on the list, we're grabbing the
> lock and releasing it right after.
>
> Probably check whether node is the last element and only re-grab the
> lock if it isn't?

It is not necessary since the lock isn't held in hot path.

>
> And btw, don't we need to hold the ->devres_lock during the whole search
> like callers of find_dr do, for example?

Because I don't want to put more constraint on the 'fn' about lock use, memory
allocation flag(gfp)..., from the view of API's user.

In fact, there is problem wrt. releasing lock since add_dr may add new node
during releasing lock.

So I plan to just hold the lock to cover calling 'fn' in -v1 instead
of using rcu list
helpers, which may introduce a lot change on devres.

Anyway the callers can copy the resources interested into a temporary list
in 'fn' and handle it later, which may simplify devres_for_each_res and other
devres helpers a lot.

Also I will introduce another parameter of 'void *data' to 'fn' in -v1.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kernel/watchdog.c : fix smp_processor_id() warning

2012-07-26 Thread Ming Lei

On Fri, Jul 27, 2012 at 3:43 AM, Don Zickus  wrote:
> On Wed, Jul 25, 2012 at 12:39:45PM +0800, Ming Lei wrote:
>> Use raw_smp_processor_id in lockup_detector_bootcpu_resume()
>> because it is enough when non-boot CPUs are offline.
>>
>> This patch fixes the following warning when DEBUG_PREEMPT
>> is enabled.
>
> Is this patched on top of linux-next?

Yes.

The warning is introduced by the commit below:

  7fb860ff90ae970cf62cf676dfc1addcf8415674
  (NMI watchdog: fix for lockup detector breakage on resume)


> It seems right based on the code usage.  Though it makes me sad the resume
> code has to hack into the cpu notifiers like that.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime

2012-07-26 Thread Ming Lei

On Fri, Jul 27, 2012 at 1:46 AM, Borislav Petkov  wrote:
> On Thu, Jul 26, 2012 at 11:44:48PM +0800, Ming Lei wrote:
>> On Thu, Jul 26, 2012 at 8:20 PM, Borislav Petkov  wrote:
>> >
>> > Ok, here's what I got from looking at the patch:
>> >
>> > Your commit message says: "Also request_firmware_nowait should be called
>> > in atomic context now, so fix the obsolete comments."
>> >
>> > Atomic context in my book means you're not allowed to sleep at all.
>>
>> In fact, I mean the function can be called in atomic context now, and
>> I know some time ago the function will create kthread to execute
>> the request_firmware, and atomic context is not allowed.
>
> Right, but when called with GFP_KERNEL mask, it can sleep, right?

>
>> > But the comment says that it is possible to sleep a little. This is very
>> > wrongly formulated AFAICT.
>>
>> The function can be run in both contexts, and I don't see any words which
>> says the function will sleep.
>
> "
> ...
>  *  Asynchronous variant of request_firmware() for user contexts where
>  *  it is not possible to sleep for long time.
>  **/
> "
>
> Not possible to sleep for a long time means the function still *can*
> sleep... even for short time. For a certain definion of "short."

In fact, many drivers like to use it in its probe() because too long sleep
in probe will slow down kernel boot if driver is built in kernel. During
kernel boot, rootfs is not mounted and user space is not ready, request_firmware
will timeout to cause problem.

Also after introducing work in this function, it is allowed to be called in
atomic context if 'gfp' is passed as GFP_ATOMIC, so I removed the obsolete
comments.

>
>> > But, since request_firmware_nowait receives a GFP mask as one of its
>> > arguments and some of its callers don't supply GFP_ATOMIC then this
>> > has nothing to do with atomic contexts at all. Then, you should simply
>> > explain in the comment why exactly callers aren't allowed to be sleeping
>> > for a long time. And using adjectives like "long" or "short" is very
>> > misleading in such explanations so please be more specific as to why the
>>
>> It is the original one, and I don't think it is wrong. Also it
>> shouldn't be covered
>> by this patch.
>>
>> Maybe I shouldn't have fixed the comment in this patch.
>
> Why, simply fix the comment to adhere to what the function does. And
> since it can sleep, maybe the easiest fix is to say simply that:
> "function can sleep", right?

No, the comment above is misleading and not useless, and I think the below
is good:

 *  Asynchronous variant of request_firmware() for user contexts where
 *  it is not possible to sleep for long time or can't sleep at all, depends
 *  on the @gfp flag passed.

Anyway, the original part of 'It can't be called in atomic contexts.' is wrong
and should be removed.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware

2012-07-26 Thread Ming Lei

On Fri, Jul 27, 2012 at 1:54 AM, Borislav Petkov  wrote:

>> No, it is not what I was saying.

I just mean the point is not mentioned in my commit log, but I admit it should
be a appropriate cause.

>
> Ok, maybe I'm not understanding this then. So explain to me this: why
> do you need that timeout value of 10, how did we decide it to be 10

If one firmware image was loaded successfully before, the probability of
loading it successfully at this time should be much higher than the 1st time
because something crazy(for example, the firmware is deleted) happens
with low probability.

Choosing 10 secs is just a estimation for loading time because the maximum
size of firmware in current distributions is about 2M bytes, since we know
it has been loaded successfully before.

> (and not 20 or 30 or whatever)? Generally, why do we need to reprogram
> the timer to a smaller timeout instead of simply doing the completion
> without a timeout?

No, it should be crazy without a timeout, and it can be triggered in init call
easily.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware

2012-07-28 Thread Ming Lei

On Fri, Jul 27, 2012 at 6:35 PM, Borislav Petkov  wrote:
> On Fri, Jul 27, 2012 at 09:54:25AM +0800, Ming Lei wrote:
>> On Fri, Jul 27, 2012 at 1:54 AM, Borislav Petkov  wrote:
>>
>> >> No, it is not what I was saying.
>>
>> I just mean the point is not mentioned in my commit log, but I admit it 
>> should
>> be a appropriate cause.
>>
>> >
>> > Ok, maybe I'm not understanding this then. So explain to me this: why
>> > do you need that timeout value of 10, how did we decide it to be 10
>>
>> If one firmware image was loaded successfully before, the probability of
>> loading it successfully at this time should be much higher than the 1st time
>> because something crazy(for example, the firmware is deleted) happens
>> with low probability.
>
> Believe it or not, I'm addressing exactly the possibility of the
> firmware disappearing from under us in the AMD microcode driver
> currently :) (and some other annoyances, of course).

Of course, it is possible since user can delete it anytime, but with very
low probability.

>
>> Choosing 10 secs is just a estimation for loading time because the maximum
>> size of firmware in current distributions is about 2M bytes, since we know
>> it has been loaded successfully before.
>
> This is exactly the comment we want over the code to explain to others
> why we're choosing 10 secs. Simply add that sentence above the 10s
> assignment and we're perfect! :-)

OK, will add the comments in -v1.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime

2012-07-28 Thread Ming Lei

On Fri, Jul 27, 2012 at 6:32 PM, Borislav Petkov  wrote:

>
> I still don't like too much the "not possible to sleep for long time"
> expression.
>
> Maybe change it to "should sleep for as small periods as possible since
> it increases boot time of device drivers requesting firmware in their
> ->probe() methods."

Fairly enough, will add this kind of description in -v1, and I will introduce
one extra patch to fix this comment since it is nothing related with the
device lifetime fix and its comments.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] firmware loader: introduce module parameter to customize fw search path

2012-10-26 Thread Ming Lei

This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
v2
- take a cleaner approach suggested by Linus
- mark the path array as const because it needn't be changed
- fix one error in Document about the module name

v1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg
---
 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   17 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..e9fce78 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(&fw_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH'
+   if firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..f5a2c40 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf)
 }
 
 /* direct firmware loading support */
-static const char *fw_path[] = {
+static char fw_path_para[256];
+static const char * const fw_path[] = {
+   fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
 };
 
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
 
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
+
+   /* skip the unset customized path */
+   if (!fw_path[0])
+   continue;
+
snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
 
file = filp_open(path, O_RDONLY, 0);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] firmware loader: introduce module parameter to customize fw search path

2012-10-26 Thread Ming Lei

On Fri, Oct 26, 2012 at 10:51 PM, Ming Lei  wrote:
> +
> +   /* skip the unset customized path */
> +   if (!fw_path[0])

There is one mistake above and should be below, sorry for the noise.

 if (!fw_path[i][0])


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] firmware loader: introduce module parameter to customize fw search path

2012-10-26 Thread Ming Lei

This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
v3
- fix one mistake on checking unset firmware path

v2
- take a cleaner approach suggested by Linus
- mark the path array as const because it needn't be changed
- fix one error in Document about the module name

v1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg
---
 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   17 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..e9fce78 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(&fw_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH'
+   if firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..62568c2 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf)
 }
 
 /* direct firmware loading support */
-static const char *fw_path[] = {
+static char fw_path_para[256];
+static const char * const fw_path[] = {
+   fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
 };
 
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
 
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
+
+   /* skip the unset customized path */
+   if (!fw_path[i][0])
+   continue;
+
snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
 
file = filp_open(path, O_RDONLY, 0);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] firmware loader: introduce module parameter to customize fw search path

2012-10-27 Thread Ming Lei

On Sat, Oct 27, 2012 at 1:20 PM, anish kumar
 wrote:
>>
>> + [1], the 'path' is a string parameter which length should be less
> whose length should be less...

Maybe, but it is not a big deal, you may find some drivers can
keep one module string parameter length as 4096.

>> + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
>> + * from kernel command because firmware_class is generally built in
> do you mean kernel command line?

Yes.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firmware: use noinline_for_stack

2012-10-28 Thread Ming Lei

On Sun, Oct 28, 2012 at 6:37 AM, Cesar Eduardo Barros  wrote:
> The comment above fw_file_size() suggests it is noinline for stack size
> reasons. Use noinline_for_stack to make this more clear.

Acked-by: Ming Lei 



Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 0/6] solve deadlock caused by memory allocation with I/O

2012-10-29 Thread Ming Lei

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime resume and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix one deadlock
problem on block device/network device during runtime resume or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio_resume on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IOFS during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

V3:
- patch 2/6 and 5/6 changed, see their commit log
- remove RFC from title since several guys have expressed that
it is a reasonable solution
V2:
- remove changes on 'may_writepage' and 'may_swap'(1/6)
- unset GFP_IOFS in try_to_free_pages() path(1/6)
- introduce pm_runtime_set_memalloc_noio()
- only apply the meachnism on block/network device and its ancestors
for runtime resume context
V1:
- take Minchan's change to avoid the check in alloc_page hot path
- change the helpers' style into save/restore as suggested by Alan
- memory allocation with no io in usb bus reset path for all devices
as suggested by Greg and Oliver

 block/genhd.c|8 
 drivers/base/power/runtime.c |   88 +-
 drivers/usb/core/hub.c   |   15 +++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |5 +++
 include/linux/sched.h|   10 +
 mm/page_alloc.c  |   10 -
 mm/vmscan.c  |   12 ++
 net/core/net-sysfs.c |5 +++
 9 files changed, 152 insertions(+), 2 deletions(-)


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2944 matches

Mail list logo