Re: [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
On Wed, Nov 28, 2012 at 5:24 AM, Rafael J. Wysocki wrote: > > Please don't duplicate code this way. > > You can move that whole thing to rpm_callback(). Yes, you'll probably need to > check dev->power.memalloc_noio twice in there, but that's OK. Good idea, I will update it in v7. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki wrote: > On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote: >> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info' >> to help PM core to teach mm not allocating memory with GFP_KERNEL >> flag for avoiding probable deadlock. >> >> As explained in the comment, any GFP_KERNEL allocation inside >> runtime_resume() or runtime_suspend() on any one of device in >> the path from one block or network device to the root device >> in the device tree may cause deadlock, the introduced >> pm_runtime_set_memalloc_noio() sets or clears the flag on >> device in the path recursively. >> >> Cc: Alan Stern >> Cc: "Rafael J. Wysocki" >> Signed-off-by: Ming Lei >> --- >> v5: >> - fix code style error >> - add comment on clear the device memalloc_noio flag >> v4: >> - rename memalloc_noio_resume as memalloc_noio >> - remove pm_runtime_get_memalloc_noio() >> - add comments on pm_runtime_set_memalloc_noio >> v3: >> - introduce pm_runtime_get_memalloc_noio() >> - hold one global lock on pm_runtime_set_memalloc_noio >> - hold device power lock when accessing memalloc_noio_resume >> flag suggested by Alan Stern >> - implement pm_runtime_set_memalloc_noio without recursion >> suggested by Alan Stern >> v2: >> - introduce pm_runtime_set_memalloc_noio() >> --- >> drivers/base/power/runtime.c | 60 >> ++ >> include/linux/pm.h |1 + >> include/linux/pm_runtime.h |3 +++ >> 3 files changed, 64 insertions(+) >> >> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c >> index 3148b10..3e198a0 100644 >> --- a/drivers/base/power/runtime.c >> +++ b/drivers/base/power/runtime.c >> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct >> device *dev) >> } >> EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration); >> >> +static int dev_memalloc_noio(struct device *dev, void *data) >> +{ >> + return dev->power.memalloc_noio; >> +} >> + >> +/* >> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag. >> + * @dev: Device to handle. >> + * @enable: True for setting the flag and False for clearing the flag. >> + * >> + * Set the flag for all devices in the path from the device to the >> + * root device in the device tree if @enable is true, otherwise clear >> + * the flag for devices in the path whose siblings don't set the flag. >> + * > > Please use counters instead of walking the whole path every time. Ie. in > addition to the flag add a counter to store the number of the device's > children having that flag set. Thanks for your review. IMO, pm_runtime_set_memalloc_noio() is only called in probe() and release() of block device and network device, which is in a very infrequent path, so I am wondering if it is worthy of introducing another counter for all devices. Also looks the current implementation of pm_runtime_set_memalloc_noio() is simple and clean enough with the flag, IMO. > I would use the flag only to store the information that > pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly > and I'd use a counter for everything else. > > That is, have power.memalloc_count that would be incremented when (1) > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when > power.memalloc_count for one of its children changes from 0 to 1 (and > analogously for decrementation). Then, check the counter in rpm_callback(). Sorry, could you explain in a bit detail why we need the counter? Looks only checking the flag in rpm_callback() is enough, doesn't it? > > Besides, don't you need to check children for the arg device itself? It isn't needed since the children of network/block device can't be involved of the deadlock in runtime PM path. Also, the function is only called by network device or block device subsystem, both the two kind of device are class device and should have no children. > >> + * The function should only be called by block device, or network >> + * device driver for solving the deadlock problem during runtime >> + * resume/suspend: >> + * >> + * If memory allocation with GFP_KERNEL is called inside runtime >> + * resume/suspend callback of any one of its ancestors(or the >> + * block device itself), the deadlock may be triggered inside the >> + * memory allocation since it mi
Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki wrote: > > Please use counters instead of walking the whole path every time. Ie. in > addition to the flag add a counter to store the number of the device's > children having that flag set. Even though counter is added, walking the whole path can't be avoided too, and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio is required to set or clear the flag(or increase/decrease the counter) of devices in the whole path. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki wrote: > > But it doesn't have to walk the children. Moreover, with counters it only Yeah, I got it, it is the advantage of counter, but with extra 'int' field introduced in 'struct device'. > needs to walk the whole path if all devices in it need to be updated. For > example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device > whose parent's counter is greater than zero already, you don't need to > walk the path above the parent. We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true) can return immediately if one parent or the 'dev' flag is true. But considered that the pm_runtime_set_memalloc_noio(dev, false) is only called in a very infrequent path(network/block device->remove()), looks the introduced cost isn't worthy of the obtained advantage. So could you accept not introducing counter? and I will update with the above improvement you suggested. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki wrote: > > Well, it may be unfrequent, but does it mean it has to do things that may > be avoided (ie. walking the children of every node in the path in some cases)? I agree so without introducing extra cost, :-) > I don't really think that the counters would cost us that much anyway. On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing 'unsigned intnoio_cnt;' to 'struct dev_pm_info', and total memory increases about 3752bytes in a small configuration(about 494 device instance). The actual memory increase should be more than the data because 'struct device' is generally embedded into other concrete device structure. >> Also looks the current implementation of pm_runtime_set_memalloc_noio() >> is simple and clean enough with the flag, IMO. > > I know you always know better. :-) We still need to consider cost and the function calling frequency, :-) > >> > I would use the flag only to store the information that >> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device >> > directly >> > and I'd use a counter for everything else. >> > >> > That is, have power.memalloc_count that would be incremented when (1) >> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) >> > when >> > power.memalloc_count for one of its children changes from 0 to 1 (and >> > analogously for decrementation). Then, check the counter in >> > rpm_callback(). >> >> Sorry, could you explain in a bit detail why we need the counter? Looks only >> checking the flag in rpm_callback() is enough, doesn't it? > > Why would I want to use power.memalloc_count in addition to the > power.memalloc_noio flag? > > Consider this: > > pm_runtime_set_memalloc_noio(dev): > return if power.memalloc_noio is set > set power.memalloc_noio > loop: > increment power.memalloc_count > if power.memalloc_count is 1 now switch to parent and go to loop I am wondering if the above should be changed to below because the child count of memalloc_noio device need to be recorded. pm_runtime_set_memalloc_noio(dev): return if power.memalloc_noio is set set power.memalloc_noio loop: increment power.memalloc_count switch to parent and go to loop So pm_runtime_set_memalloc_noio(dev) will become worse than the improved pm_runtime_set_memalloc_noio(dev, true), which can return immediately if one dev or parent's flag is true. > pm_runtime_clear_memalloc_noio(dev): > return if power.memalloc_noio is unset > unset power.memalloc_noio > loop: > decrement power.memalloc_count > if power.memalloc_count is 0 now switch to parent and go to loop The above will perform well than pm_runtime_set_memalloc_noio(dev, false), because the above avoids to walk children of device. So one becomes worse and another becomes better, :-) Also the children count of one device is generally very small, less than 10 for most devices, see the data obtained in one common x86 pc(thinkpad t410) from below link: http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log - about 8 devices whose child count is more than 10, top three are 18, 17 ,12, and all the three are root devices. - about 117 devices whose child count is between 1 and 9 - other 501 devices whose child count is zero >From above data, walking device children should have not much effect on performance of pm_runtime_set_memalloc_noio(), which is also called in very infrequent path. > Looks kind of simpler, doesn't it? Looks simpler, but more code lines than single pm_runtime_set_memalloc_noio(), :-) > > And why rpm_callback() should check power.memalloc_count instead of the count? > Because power.memalloc_noio will only be set for devices that > pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for > the parents). > > And that works even if someone calls any of them twice in a row for the same > device (presumably by mistake) and doesn't have to make any assumptions > about devices it is called for. IMO, we can ignore the mistake usage because the function is called only in network/block core code currently, not by individual driver. > >> > Besides, don't you need to check children for the arg device itself? >> >> It isn't needed since the children of network/block device can't be >> involved of the deadlock in runtime PM path. >> >> Also, the function is only called by network device or block device >> subsystem, both the two kind of device are class device and should >> have no children. > > OK, so n
Re: use after free in sysfs_find_dirent
Hi Sasha, On Tue, Mar 19, 2013 at 10:06 AM, Sasha Levin wrote: > [ 232.822703] sysfs_dir_pos-973 sysfs_dirent use after free: > vx855(vx855)-bind, 0-25520352 Looks filp->f_pos is changed as zero by llseek(), so may leave filp->private_data point to one refcount-balanced sysfs_dirent object, which will be put again afterwards. Hope we are luck this time, please try the attachment patch. Thanks, -- Ming Lei sysfs-fix-readdir-v2.patch Description: Binary data
Re: use after free in sysfs_find_dirent
Hi Sasha, On Tue, Mar 19, 2013 at 11:40 AM, Ming Lei wrote: > Hi Sasha, > > On Tue, Mar 19, 2013 at 10:06 AM, Sasha Levin wrote: >> [ 232.822703] sysfs_dir_pos-973 sysfs_dirent use after free: >> vx855(vx855)-bind, 0-25520352 > > Looks filp->f_pos is changed as zero by llseek(), so may leave > filp->private_data > point to one refcount-balanced sysfs_dirent object, which will be put > again afterwards. > > Hope we are luck this time, please try the attachment patch. Looks the better and simpler way is to hold the i_mutex for llseek. If you haven't test the v2, please ignore it and just test the attachment v3 patch. Thanks, -- Ming Lei sysfs-fix-readdir-v3.patch Description: Binary data
Re: use after free in sysfs_find_dirent
Hi Sasha, On Wed, Mar 20, 2013 at 12:28 AM, Sasha Levin wrote: > On 03/19/2013 07:54 AM, Ming Lei wrote: > > With v3 of the patch: > > [ 1275.665758] sysfs_dir_pos-973 sysfs_dirent use after free: > tun(tun)-uevent, 2-1472641949 Thanks again for your test. Looks it is caused by another bug in sysfs_readdir: if filldir() returns failure(such as small buffer length passed from userspace, very probably for trinity) in case of 'if (filp->f_pos == 0 or 1)', filp->private_data still will point to one refcount-balanced sysfs_dirent object. V4 adds fix for this situation, please test attachment v4 patch. Thanks, -- Ming Lei sysfs-fix-readdir-v4.patch Description: Binary data
[PATCH 1/2] sysfs: fix race between readdir and lseek
While readdir() is running, lseek() may set filp->f_pos as zero, then may leave filp->private_data pointing to one sysfs_dirent object without holding its reference counter, so the sysfs_dirent object may be used after free in next readdir(). This patch holds inode->i_mutex to avoid the problem since the lock is always held in readdir path. Reported-by: Dave Jones Tested-by: Sasha Levin Cc: Signed-off-by: Ming Lei --- fs/sysfs/dir.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 2fbdff6..c9e1660 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -1058,10 +1058,21 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) return 0; } +static loff_t sysfs_dir_llseek(struct file *file, loff_t offset, int whence) +{ + struct inode *inode = file_inode(file); + loff_t ret; + + mutex_lock(&inode->i_mutex); + ret = generic_file_llseek(file, offset, whence); + mutex_unlock(&inode->i_mutex); + + return ret; +} const struct file_operations sysfs_dir_operations = { .read = generic_read_dir, .readdir= sysfs_readdir, .release= sysfs_dir_release, - .llseek = generic_file_llseek, + .llseek = sysfs_dir_llseek, }; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] sysfs: handle failure path correctly for readdir()
In case of 'if (filp->f_pos == 0 or 1)' of sysfs_readdir(), the failure from filldir() isn't handled, and the reference counter of the sysfs_dirent object pointed by filp->private_data will be released without clearing filp->private_data, so use after free bug will be triggered later. This patch returns immeadiately under the situation for fixing the bug, and it is reasonable to return from readdir() when filldir() fails. Reported-by: Dave Jones Tested-by: Sasha Levin Cc: Signed-off-by: Ming Lei --- fs/sysfs/dir.c |4 1 file changed, 4 insertions(+) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index c9e1660..e145126 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -1020,6 +1020,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) ino = parent_sd->s_ino; if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0) filp->f_pos++; + else + return 0; } if (filp->f_pos == 1) { if (parent_sd->s_parent) @@ -1028,6 +1030,8 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir) ino = parent_sd->s_ino; if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0) filp->f_pos++; + else + return 0; } mutex_lock(&sysfs_mutex); for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] sysfs: fix use after free in sysfs_readdir()
Hi, These two patches fix two bugs inside sysfs_readdir(), and both the two bugs may cause use after free problem, which is reported by Dave on trinity fuzz test on syscall. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] sysfs: fix race between readdir and lseek
On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan wrote: > > In fact the same race exists between readdir() and read()/write()... Fortunately, no read()/write() are implemented on sysfs directory, :-) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] sysfs: fix race between readdir and lseek
On Thu, Mar 21, 2013 at 11:28 AM, Li Zefan wrote: > On 2013/3/21 11:17, Ming Lei wrote: >> On Thu, Mar 21, 2013 at 10:41 AM, Li Zefan wrote: >>> >>> In fact the same race exists between readdir() and read()/write()... >> >> Fortunately, no read()/write() are implemented on sysfs directory, :-) >> > > That's irrelevant... As far as sysfs is concerned, the filp->f_ops can't be changed in read/write path. > > See my report: > > https://patchwork.kernel.org/patch/2160771/ Yes, I know there might be some mess after the commit ef3d0fd2 (vfs: do (nearly) lockless generic_file_llseek). Also looks it has been stated in Documentation/filesystems/Locking: ->llseek() locking has moved from llseek to the individual llseek implementations. If your fs is not using generic_file_llseek, you need to acquire and release the appropriate locks in your ->llseek(). For many filesystems, it is probably safe to acquire the inode mutex or just to use i_size_read() instead. Note: this does not protect the file->f_pos against concurrent modifications since this is something the userspace has to take care about. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout
Hi Frank, On Thu, Mar 21, 2013 at 11:29 AM, Frank Rowand wrote: > > I found the problem on 3.6.11, but have not replicated it on 3.9-rcX > yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try > to work on that issue tomorrow. I play upstream kernel on Pandaboard A1 frequently, looks not see the failure problem before. Maybe the problem is config dependent. If you may share your config file, I'd like to do the test too. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sysfs: check if one entry has been removed before freeing
It might be a kernel disaster if one sysfs entry is freed but still referenced by sysfs tree. Recently Dave and Sasha reported one use-after-free problem on sysfs entry, and the problem has been troubleshooted with help of debug message added in this patch. Given sysfs_get_dirent/sysfs_put are exported APIs, even inside sysfs they are called in many contexts(kobject/attribe add/delete, inode init/drop, dentry lookup/release, readdir, ...), it is healthful to check the removed flag before freeing one entry and dump message if it is freeing without being removed first. Cc: Dave Jones Cc: Sasha Levin Signed-off-by: Ming Lei --- fs/sysfs/dir.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 1bf016b..328ef9b 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) */ parent_sd = sd->s_parent; + if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) { + printk(KERN_ERR "sysfs: free using entry: %s/%s\n", + parent_sd ? parent_sd->s_name : "", + sd->s_name); + BUG(); + } + if (sysfs_type(sd) == SYSFS_KOBJ_LINK) sysfs_put(sd->s_symlink.target_sd); if (sysfs_type(sd) & SYSFS_COPY_NAME) @@ -386,7 +393,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type) sd->s_name = name; sd->s_mode = mode; - sd->s_flags = type; + sd->s_flags = type | SYSFS_FLAG_REMOVED; return sd; @@ -466,6 +473,9 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd) ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME; } + /* Mark the entry added into directory tree */ + sd->s_flags &= ~SYSFS_FLAG_REMOVED; + return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysfs: check if one entry has been removed before freeing
On Wed, Apr 3, 2013 at 11:04 AM, Dave Jones wrote: > On Wed, Apr 03, 2013 at 10:58:23AM +0800, Ming Lei wrote: > > > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c > > index 1bf016b..328ef9b 100644 > > --- a/fs/sysfs/dir.c > > +++ b/fs/sysfs/dir.c > > @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) > > */ > > parent_sd = sd->s_parent; > > > > +if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) { > > +printk(KERN_ERR "sysfs: free using entry: %s/%s\n", > > +parent_sd ? parent_sd->s_name : "", > > +sd->s_name); > > +BUG(); > > +} > > Please use WARN instead of BUG. For an in-ram filesystem like > sysfs, there's no real reason to lock-up the machine in this way > making it harder to debug. If WARN is used, the freed memory will be allocated to other kernel components, then sysfs may change the memory and cause destruction, so maybe it is better to use BUG to stop kernel. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysfs: check if one entry has been removed before freeing
On Wed, Apr 3, 2013 at 1:35 PM, Greg Kroah-Hartman wrote: > On Wed, Apr 03, 2013 at 11:52:39AM +0800, Ming Lei wrote: >> On Wed, Apr 3, 2013 at 11:04 AM, Dave Jones wrote: >> > On Wed, Apr 03, 2013 at 10:58:23AM +0800, Ming Lei wrote: >> > >> > > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c >> > > index 1bf016b..328ef9b 100644 >> > > --- a/fs/sysfs/dir.c >> > > +++ b/fs/sysfs/dir.c >> > > @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) >> > > */ >> > > parent_sd = sd->s_parent; >> > > >> > > +if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) { >> > > +printk(KERN_ERR "sysfs: free using entry: %s/%s\n", >> > > +parent_sd ? parent_sd->s_name : "", >> > > +sd->s_name); >> > > +BUG(); >> > > +} >> > >> > Please use WARN instead of BUG. For an in-ram filesystem like >> > sysfs, there's no real reason to lock-up the machine in this way >> > making it harder to debug. >> >> If WARN is used, the freed memory will be allocated to other >> kernel components, then sysfs may change the memory and cause >> destruction, so maybe it is better to use BUG to stop kernel. > > No, it's never ok to call BUG(), sorry, please fix this. Sorry, could you explain it in a bit detail? IMO, it is really a bug when code runs here, and there are much similar BUG_ON() uses in current sysfs code too. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one
Hi, On Thu, Apr 4, 2013 at 12:19 AM, Russell King - ARM Linux wrote: > > Still not convinced this is a proper fix. Look, the problem is this: > > - Initially, set the CPU percpu offset to zero. This means the boot > CPU reads and writes to the percpu data section in the kernel image. > > - The percpu areas are initialized, and the percpu data copied to each > percpu data - this will have any writes from the boot CPU included as > changes to the percpu data. > > - The boot CPU continues to read/write to the percpu data section. No, the boot CPU switches to access percpu area after the area is created(after setup_per_cpu_areas() and smp_prepare_boot_cpu()). > > - If the boot CPU suspends/resumes, cpu_init() gets called, which will > call set_my_cpu_offset(per_cpu_offset(cpu)); for the boot CPU. > > - The boot CPU now uses the allocated percpu data section and any > updates it did in the percpu data section in the kernel image are > lost to it. > > Your patch may be right on its own to solve the initial problem, but > it leaves a _big_ hole. Without the patch, looks the hols was still here, at least before commit 14318efb(ARM: 7587/1: implement optimized percpu variable access), the per_cpu_offset() always return 0 before percpu area is created on ARM. Same on other ARCHs too. > > Now, the big question here: is it right that the boot CPU should ever > write to the static percpu data section in the kernel image? What if IMO, it isn't right, but as Tejun said, "it mostly due to historical reasons rather than by design, as long as the data is in consistent state by and during percpu setup, nothing will break." As far as the lockdep case, it is a bit special since lock can be used very early, and surely more early than creating percpu area. > there's a pointer in there, initially NULL, which then gets checked > by each CPU and initialized if NULL - we'll end up sharing the same > allocation amongst all CPUs, which probably isn't what was intended. > If there's a list_head which gets added to, that too will be very bad. Once lockstat is disabled, arm can boot well, that means no others might access percpu data section early, so could we just treat it as a special case? > > Although you have uncovered a problem, I still think by setting the > offset to zero initially, you're just papering over a much bigger > can of worms. > > So, should percpu data be used this early in boot before the percpu > stuff is properly initialized? That feels _extremely_ unsafe. > > This, I think, needs to be addressed properly. And part of that is > knowing where things went wrong. Will Deacon asked you for a backtrace > showing where this problem occured. Your response seems to be to > resend the patch with a "v1" tag a no new information. I have explained to Will Deacon, and looks no oops trace is generated on the memory access exception during booting. And the hang happened in mutex_unlock(&cgroup_mutex)(cgroup_init_subsys<- cgroup_init_early). > Sorry, not applying this until the above issue has been discussed > and the location of these percpu accesses has been properly analysed. No problem. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysfs: check if one entry has been removed before freeing
On Thu, Apr 4, 2013 at 12:08 AM, Greg Kroah-Hartman wrote: > > Then make it a WARN() call, like David said, to give us a chance to get > the report from a user so we can fix it. If the machine crashes after > that, fine, but hopefully we will get a oops report out of it. OK, got it, and console might not be available at that time. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] sysfs: check if one entry has been removed before freeing
It might be a kernel disaster if one sysfs entry is freed but still referenced by sysfs tree. Recently Dave and Sasha reported one use-after-free problem on sysfs entry, and the problem has been troubleshooted with help of debug message added in this patch. Given sysfs_get_dirent/sysfs_put are exported APIs, even inside sysfs they are called in many contexts(kobject/attribe add/delete, inode init/drop, dentry lookup/release, readdir, ...), it is healthful to check the removed flag before freeing one entry and dump message if it is freeing without being removed first. Cc: Dave Jones Cc: Sasha Levin Signed-off-by: Ming Lei --- v1: - use WARN() instead of BUG() as suggested by Dave and Greg fs/sysfs/dir.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 1bf016b..e8e0e71 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -268,6 +268,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) */ parent_sd = sd->s_parent; + WARN(!(sd->s_flags & SYSFS_FLAG_REMOVED), + "sysfs: free using entry: %s/%s\n", + parent_sd ? parent_sd->s_name : "", sd->s_name); + if (sysfs_type(sd) == SYSFS_KOBJ_LINK) sysfs_put(sd->s_symlink.target_sd); if (sysfs_type(sd) & SYSFS_COPY_NAME) @@ -386,7 +390,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type) sd->s_name = name; sd->s_mode = mode; - sd->s_flags = type; + sd->s_flags = type | SYSFS_FLAG_REMOVED; return sd; @@ -466,6 +470,9 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd) ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME; } + /* Mark the entry added into directory tree */ + sd->s_flags &= ~SYSFS_FLAG_REMOVED; + return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
e { > - ret = _request_firmware_load(fw_priv, true, > - firmware_loading_timeout()); > - usermodehelper_read_unlock(); > - } > - if (ret) > - _request_firmware_cleanup(firmware_p); > - > - return ret; > + return _request_firmware(firmware_p, name, device, true, false); > } > > /** > @@ -1046,31 +1098,11 @@ static void request_firmware_work_func(struct > work_struct *work) > { > struct firmware_work *fw_work; > const struct firmware *fw; > - struct firmware_priv *fw_priv; > - long timeout; > - int ret; > > fw_work = container_of(work, struct firmware_work, work); > - fw_priv = _request_firmware_prepare(&fw, fw_work->name, > fw_work->device, > - fw_work->uevent, true); > - if (IS_ERR_OR_NULL(fw_priv)) { > - ret = PTR_RET(fw_priv); > - goto out; > - } > - > - timeout = usermodehelper_read_lock_wait(firmware_loading_timeout()); > - if (timeout) { > - ret = _request_firmware_load(fw_priv, fw_work->uevent, > timeout); > - usermodehelper_read_unlock(); > - } else { > - dev_dbg(fw_work->device, "firmware: %s loading timed out\n", > - fw_work->name); > - ret = -EAGAIN; > - } > - if (ret) > - _request_firmware_cleanup(&fw); > > - out: > + _request_firmware(&fw, fw_work->name, fw_work->device, > + fw_work->uevent, true); > fw_work->cont(fw, fw_work->context); > put_device(fw_work->device); > Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
} > - > - /* pass the pages buffer to driver at the last minute */ > - fw_set_page_data(buf, fw_priv->fw); > - > fw_priv->buf = NULL; > - mutex_unlock(&fw_lock); > - > - if (direct_load) > - goto err_put_dev; > > device_remove_file(f_dev, &dev_attr_loading); > err_del_bin_attr: > @@ -972,6 +965,77 @@ err_put_dev: > return retval; > } > > +static int fw_load_from_user_helper(struct firmware *firmware, > + const char *name, struct device *device, > + bool uevent, bool nowait) > +{ > + struct firmware_priv *fw_priv; > + long timeout; > + int ret; > + > + fw_priv = fw_create_instance(firmware, name, device, uevent, nowait); > + if (IS_ERR(fw_priv)) > + return PTR_ERR(fw_priv); > + > + fw_priv->buf = firmware->priv; > + > + timeout = firmware_loading_timeout(); > + if (nowait) { > + timeout = usermodehelper_read_lock_wait(timeout); > + if (!timeout) { > + dev_dbg(device, "firmware: %s loading timed out\n", > + name); > + kfree(fw_priv); > + return -EAGAIN; > + } > + } else { > + ret = usermodehelper_read_trylock(); > + if (WARN_ON(ret)) { > + dev_err(device, "firmware: %s will not be loaded\n", > + name); > + kfree(fw_priv); > + return ret; > + } > + } The above usermodehelper_read_lock thing may be a functional change, and looks not what you claimed in commit log, :-). The lock is currently held in direct loading case, but your patch change the rule. Without holding the lock, request_firmware() may touch filesystem / storage too early during kernel boot or system resume in direct loading case. > + > + ret = _request_firmware_load(fw_priv, uevent, timeout); > + usermodehelper_read_unlock(); > + return ret; > +} > + > +/* called from request_firmware() and request_firmware_work_func() */ > +static int > +_request_firmware(const struct firmware **firmware_p, const char *name, > + struct device *device, bool uevent, bool nowait) > +{ > + struct firmware *fw; > + int ret; > + > + if (!firmware_p) > + return -EINVAL; > + > + ret = _request_firmware_prepare(&fw, name, device); > + if (ret <= 0) /* error or already assigned */ > + goto out; > + > + ret = 0; > + if (!fw_get_filesystem_firmware(device, fw->priv)) > + ret = fw_load_from_user_helper(fw, name, device, > + uevent, nowait); > + > + if (!ret) > + ret = assign_firmware_buf(fw, device); > + > + out: > + if (ret < 0) { > + release_firmware(fw); > + fw = NULL; > + } > + > + *firmware_p = fw; > + return ret; > +} > + > /** > * request_firmware: - send firmware request and wait for it > * @firmware_p: pointer to firmware image > @@ -996,26 +1060,7 @@ int > request_firmware(const struct firmware **firmware_p, const char *name, > struct device *device) > { > - struct firmware_priv *fw_priv; > - int ret; > - > - fw_priv = _request_firmware_prepare(firmware_p, name, device, true, > - false); > - if (IS_ERR_OR_NULL(fw_priv)) > - return PTR_RET(fw_priv); > - > - ret = usermodehelper_read_trylock(); > - if (WARN_ON(ret)) { > - dev_err(device, "firmware: %s will not be loaded\n", name); > - } else { > - ret = _request_firmware_load(fw_priv, true, > - firmware_loading_timeout()); > - usermodehelper_read_unlock(); > - } > - if (ret) > - _request_firmware_cleanup(firmware_p); > - > - return ret; > + return _request_firmware(firmware_p, name, device, true, false); > } > > /** > @@ -1046,33 +1091,12 @@ static void request_firmware_work_func(struct > work_struct *work) > { > struct firmware_work *fw_work; > const struct firmware *fw; > - struct firmware_priv *fw_priv; > - long timeout; > - int ret; > > fw_work = container_of(work, struct firmware_work, work); > - fw_priv = _request_firmware_prepare(&fw, fw_work->name, > fw_work->device, > - fw_work->uevent, true); > - if (IS_ERR_OR_NULL(fw_priv)) { > - ret = PTR_RET(fw_priv); > - goto out; > - } > > - timeout = usermodehelper_read_lock_wait(firmware_loading_timeout()); > - if (timeout) { > - ret = _request_firmware_load(fw_priv, fw_work->uevent, > timeout); > - usermodehelper_read_unlock(); > - } else { > - dev_dbg(fw_work->device, "firmware: %s loading timed out\n", > - fw_work->name); > - ret = -EAGAIN; > - } > - if (ret) > - _request_firmware_cleanup(&fw); > - > - out: > + _request_firmware(&fw, fw_work->name, fw_work->device, > + fw_work->uevent, true); > fw_work->cont(fw, fw_work->context); > - put_device(fw_work->device); The above put_device is the pair of get_device inside request_firmware_nowait(), I am wondering why you think it is not balanced, and to be removed . Did you observe a double free? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
On Wed, Jan 30, 2013 at 3:17 PM, Takashi Iwai wrote: >> The above usermodehelper_read_lock thing may be a functional change, >> and looks not what you claimed in commit log, :-). The lock is currently >> held in >> direct loading case, but your patch change the rule. Without holding the >> lock, >> request_firmware() may touch filesystem / storage too early during >> kernel boot or system resume in direct loading case. > > Does it really happen in a real scenario? Some crazy drivers might call request_firmware inside resume callback, with usermodehelper_read_lock, we can find the mistake easily and log it. > > If so, using usermode helper lock for that purpose sounds like an > abuse to be fixed differently or replaced with a better one. Might be, but looks not good to introduce this change in a code refactoring patch. Or you can do it in another patch for discussion if you have better way to handle the situation. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai wrote: > > But it's supposed to be cached, no? Generally it will be cached, but some crazy devices might come as new device during resume, so we still need to handle the situation. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
On Wed, Jan 30, 2013 at 6:53 PM, Takashi Iwai wrote: > At Wed, 30 Jan 2013 18:50:05 +0800, > Ming Lei wrote: >> >> On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai wrote: >> > >> > But it's supposed to be cached, no? >> >> Generally it will be cached, but some crazy devices might come as new >> device during resume, so we still need to handle the situation. > > In that case, shouldn't we simply return an error instead of > usermodehelper lock (at least for direct loading)? The check depends on usermodehelper_read_trylock now, is there other simpler way to do the check? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
On Wed, Jan 30, 2013 at 7:08 PM, Takashi Iwai wrote: > At Wed, 30 Jan 2013 11:53:14 +0100, > Takashi Iwai wrote: >> >> At Wed, 30 Jan 2013 18:50:05 +0800, >> Ming Lei wrote: >> > >> > On Wed, Jan 30, 2013 at 6:31 PM, Takashi Iwai wrote: >> > > >> > > But it's supposed to be cached, no? >> > >> > Generally it will be cached, but some crazy devices might come as new >> > device during resume, so we still need to handle the situation. >> >> In that case, shouldn't we simply return an error instead of >> usermodehelper lock (at least for direct loading)? > > The patch below is what I have in my mind... > > > Takashi > > --- > From: Takashi Iwai > Subject: [PATCH] firmware: Skip usermodehelper_lock for direct loading > > We don't need a user mode helper lock for the direct fw loading. > This also reduces the use of timeout when user mode helper is > disabled, thus we can move the code into ifdef block, too. > > For avoiding the possible call of request_firmware() without caching, > add a check of suspend state for the direct loading case, and returns > immediately if it's called during suspend/resume. Then it proceeds to > the user mode helper if enabled, or returns the error. > > Signed-off-by: Takashi Iwai > --- > drivers/base/firmware_class.c | 97 > --- > 1 file changed, 54 insertions(+), 43 deletions(-) > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index f1caad7..c973bb0 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -88,13 +88,6 @@ enum { > FW_STATUS_ABORT, > }; > > -static int loading_timeout = 60; /* In seconds */ > - > -static inline long firmware_loading_timeout(void) > -{ > - return loading_timeout > 0 ? loading_timeout * HZ : > MAX_SCHEDULE_TIMEOUT; > -} > - > struct firmware_cache { > /* firmware_buf instance will be added into the below list */ > spinlock_t lock; > @@ -315,6 +308,9 @@ static bool fw_get_filesystem_firmware(struct device > *device, > bool success = false; > char *path = __getname(); > > + if (device->power.is_suspended) > + return false; The device which is requesting firmware is resumed does not mean the storage device has been resumed, so this check isn't enough. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Page allocation failure on v3.8-rc5
sprintf(state->name, "p"); - state->limit = disk_max_parts(hd); i = res = err = 0; while (!res && check_part[i]) { - memset(&state->parts, 0, sizeof(state->parts)); + memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. @@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct block_device *bdev) printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); - kfree(state); + release_partitions(state); return ERR_PTR(res); } diff --git a/block/partitions/check.h b/block/partitions/check.h index 52b1003..8323808 100644 --- a/block/partitions/check.h +++ b/block/partitions/check.h @@ -15,13 +15,15 @@ struct parsed_partitions { int flags; bool has_info; struct partition_meta_info info; - } parts[DISK_MAX_PARTS]; + } *parts; int next; int limit; bool access_beyond_eod; char *pp_buf; }; +extern void release_partitions(struct parsed_partitions *state); + struct parsed_partitions * check_partition(struct gendisk *, struct block_device *); Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] firmware: Refactoring for splitting user-mode helper code
} > - > - /* pass the pages buffer to driver at the last minute */ > - fw_set_page_data(buf, fw_priv->fw); > - > fw_priv->buf = NULL; > - mutex_unlock(&fw_lock); > - > - if (direct_load) > - goto err_put_dev; > > device_remove_file(f_dev, &dev_attr_loading); > err_del_bin_attr: > @@ -972,6 +965,74 @@ err_put_dev: > return retval; > } > > +static int fw_load_from_user_helper(struct firmware *firmware, > + const char *name, struct device *device, > + bool uevent, bool nowait, long timeout) > +{ > + struct firmware_priv *fw_priv; > + > + fw_priv = fw_create_instance(firmware, name, device, uevent, nowait); > + if (IS_ERR(fw_priv)) > + return PTR_ERR(fw_priv); > + > + fw_priv->buf = firmware->priv; > + return _request_firmware_load(fw_priv, uevent, timeout); > +} > + > +/* called from request_firmware() and request_firmware_work_func() */ > +static int > +_request_firmware(const struct firmware **firmware_p, const char *name, > + struct device *device, bool uevent, bool nowait) > +{ > + struct firmware *fw; > + long timeout; > + int ret; > + > + if (!firmware_p) > + return -EINVAL; > + > + ret = _request_firmware_prepare(&fw, name, device); > + if (ret <= 0) /* error or already assigned */ > + goto out; > + > + ret = 0; > + timeout = firmware_loading_timeout(); The above line may be moved below the line of 'if (nowait) '. > + if (nowait) { > + timeout = usermodehelper_read_lock_wait(timeout); > + if (!timeout) { > + dev_dbg(device, "firmware: %s loading timed out\n", > + name); > + ret = -EBUSY; > + } > + } else { > + ret = usermodehelper_read_trylock(); > + if (WARN_ON(ret)) { > + dev_err(device, "firmware: %s will not be loaded\n", > + name); > + } > + } > + > + if (!ret) { > + if (!fw_get_filesystem_firmware(device, fw->priv)) > + ret = fw_load_from_user_helper(fw, name, device, > + uevent, nowait, > + timeout); > + if (!ret) > + ret = assign_firmware_buf(fw, device); > + } > + > + usermodehelper_read_unlock(); The above should be move inside the above 'if (!ret) {...}' Except for the above two comments, looks the patch v4 is fine. Please CC greg-kh when you send v5, and you may add my ack for the patchset. Acked-by: Ming Lei Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] firmware: Make user-mode helper optional (v5)
On Fri, Feb 1, 2013 at 12:15 AM, Greg Kroah-Hartman wrote: > On Thu, Jan 31, 2013 at 11:13:53AM +0100, Takashi Iwai wrote: >> Hi, >> >> this is a series of revised patches for making the user-mode helper >> call in the firmware loader optional. Hopefully all things settled >> down now. >> >> v4->v5: fix invalid call of usermodehelper_unlock() > > Yeah! I was working on doing just this right as you sent this series > in. I'll take your patches instead of mine, you did all of the hard > work here, thanks :) > > Ming, any objection to me taking them? No objections, and I have reviewed the patch set and acked them all. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Page allocation failure on v3.8-rc5
On Fri, Feb 1, 2013 at 7:43 AM, Andrew Morton wrote: > On Wed, 30 Jan 2013 19:53:22 +0800 > Ming Lei wrote: > >> The allocation failure is caused by the big sizeof(struct parsed_partitions), >> which is 64K in my 32bit box, > > Geeze. > > We could fix that nicely by making parsed_partitions.parts an array of > pointers to a single `struct parsed_partition' and allocating those > on-demand. > > But given the short-lived nature of this storage and the infrequency of > check_partition(), that isn't necessary. > >> could you test the blow patch to see >> if it can fix the allocation failure? > > (The patch is wordwrapped) Sorry for that, I send out it for test. > >> ... >> >> @@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) >> = { >> NULL >> }; >> >> +struct parsed_partitions *allocate_partitions(int nr) >> +{ >> + struct parsed_partitions *state; >> + >> + state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL); > > I personally prefer sizefo(*state) here. It means the reader doesn't > have to scroll back to check things. OK, will use sizeof(*state). >> + if (!state) >> + return NULL; >> + >> + state->parts = vzalloc(nr * sizeof(state->parts[0])); >> + if (!state->parts) { >> + kfree(state); >> + return NULL; >> + } > > It doesn't really need to be this complex - we could just vmalloc the > entire `struct parsed_partitions'. But I see that your change will The above approach can save one 32K allocation approximately. > cause us to allcoate much less memory in many situations, which is > good. It should be mentioned in the changelog! OK, I will add the changelog later. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: partition: optimize memory allocation in check_partition
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so it is easy to trigger page allocation failure by check_partition, especially in hotplug block device situation(such as, USB mass storage, MMC card, ...), and Felipe Balbi has observed the failure. This patch does below optimizations on the allocation of struct parsed_partitions to try to address the issue: - make parsed_partitions.parts as pointer so that the pointed memory can fit in 32KB buffer, then approximate 32KB memory can be saved - vmalloc the buffer pointed by parsed_partitions.parts because 32KB is still a bit big for kmalloc - given that many devices have the partition count limit, so only allocate disk_max_parts() partitions instead of 256 partitions Cc: sta...@vger.kernel.org Signed-off-by: Ming Lei --- block/partition-generic.c |2 +- block/partitions/check.c | 35 ++- block/partitions/check.h |4 +++- 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/block/partition-generic.c b/block/partition-generic.c index f1d1451..043d0bd 100644 --- a/block/partition-generic.c +++ b/block/partition-generic.c @@ -525,7 +525,7 @@ rescan: md_autodetect_dev(part_to_dev(part)->devt); #endif } - kfree(state); + release_partitions(state); return 0; } diff --git a/block/partitions/check.c b/block/partitions/check.c index bc90867..86c9450 100644 --- a/block/partitions/check.c +++ b/block/partitions/check.c @@ -14,6 +14,7 @@ */ #include +#include #include #include @@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) = { NULL }; +static struct parsed_partitions *allocate_partitions(int nr) +{ + struct parsed_partitions *state; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return NULL; + + state->parts = vzalloc(nr * sizeof(state->parts[0])); + if (!state->parts) { + kfree(state); + return NULL; + } + + return state; +} + +void release_partitions(struct parsed_partitions *state) +{ + vfree(state->parts); + kfree(state); +} + struct parsed_partitions * check_partition(struct gendisk *hd, struct block_device *bdev) { struct parsed_partitions *state; int i, res, err; - state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL); + i = disk_max_parts(hd); + state = allocate_partitions(i); if (!state) return NULL; + state->limit = i; state->pp_buf = (char *)__get_free_page(GFP_KERNEL); if (!state->pp_buf) { - kfree(state); + release_partitions(state); return NULL; } state->pp_buf[0] = '\0'; @@ -128,10 +154,9 @@ check_partition(struct gendisk *hd, struct block_device *bdev) if (isdigit(state->name[strlen(state->name)-1])) sprintf(state->name, "p"); - state->limit = disk_max_parts(hd); i = res = err = 0; while (!res && check_part[i]) { - memset(&state->parts, 0, sizeof(state->parts)); + memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. @@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct block_device *bdev) printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); - kfree(state); + release_partitions(state); return ERR_PTR(res); } diff --git a/block/partitions/check.h b/block/partitions/check.h index 52b1003..4fefdbb 100644 --- a/block/partitions/check.h +++ b/block/partitions/check.h @@ -15,13 +15,15 @@ struct parsed_partitions { int flags; bool has_info; struct partition_meta_info info; - } parts[DISK_MAX_PARTS]; + } *parts; int next; int limit; bool access_beyond_eod; char *pp_buf; }; +void release_partitions(struct parsed_partitions *state); + struct parsed_partitions * check_partition(struct gendisk *, struct block_device *); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: partition: optimize memory allocation in check_partition
On Fri, Feb 1, 2013 at 11:20 AM, Joe Perches wrote: >> @@ -15,13 +15,15 @@ struct parsed_partitions { >> int flags; >> bool has_info; >> struct partition_meta_info info; >> - } parts[DISK_MAX_PARTS]; >> + } *parts; > > This is relatively unusual style. > Could you break out this struct instead? IMO, looks it isn't necessary since no one uses the type of the embedded structure directly, and the style isn't introduced by this patch, which only follows the previous one. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: partition: optimize memory allocation in check_partition
On Fri, Feb 1, 2013 at 4:33 PM, Yasuaki Ishimatsu wrote: > At least, you should use release_partitions() instead of kfree() here. Good catch, thank you for pointing it out, and I will post v1 later with the update. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] block: partition: optimize memory allocation in check_partition
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so it is easy to trigger page allocation failure by check_partition, especially in hotplug block device situation(such as, USB mass storage, MMC card, ...), and Felipe Balbi has observed the failure. This patch does below optimizations on the allocation of struct parsed_partitions to try to address the issue: - make parsed_partitions.parts as pointer so that the pointed memory can fit in 32KB buffer, then approximate 32KB memory can be saved - vmalloc the buffer pointed by parsed_partitions.parts because 32KB is still a bit big for kmalloc - given that many devices have the partition count limit, so only allocate disk_max_parts() partitions instead of 256 partitions Cc: sta...@vger.kernel.org Signed-off-by: Ming Lei --- v1: - add one missing free_partitions - rename release_partitions as free_partitions block/partition-generic.c |4 ++-- block/partitions/check.c | 35 ++- block/partitions/check.h |4 +++- 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/block/partition-generic.c b/block/partition-generic.c index f1d1451..17d44ea 100644 --- a/block/partition-generic.c +++ b/block/partition-generic.c @@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev) int p, highest, res; rescan: if (state && !IS_ERR(state)) { - kfree(state); + free_partitions(state); state = NULL; } @@ -525,7 +525,7 @@ rescan: md_autodetect_dev(part_to_dev(part)->devt); #endif } - kfree(state); + free_partitions(state); return 0; } diff --git a/block/partitions/check.c b/block/partitions/check.c index bc90867..944fe22 100644 --- a/block/partitions/check.c +++ b/block/partitions/check.c @@ -14,6 +14,7 @@ */ #include +#include #include #include @@ -106,18 +107,43 @@ static int (*check_part[])(struct parsed_partitions *) = { NULL }; +static struct parsed_partitions *allocate_partitions(int nr) +{ + struct parsed_partitions *state; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return NULL; + + state->parts = vzalloc(nr * sizeof(state->parts[0])); + if (!state->parts) { + kfree(state); + return NULL; + } + + return state; +} + +void free_partitions(struct parsed_partitions *state) +{ + vfree(state->parts); + kfree(state); +} + struct parsed_partitions * check_partition(struct gendisk *hd, struct block_device *bdev) { struct parsed_partitions *state; int i, res, err; - state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL); + i = disk_max_parts(hd); + state = allocate_partitions(i); if (!state) return NULL; + state->limit = i; state->pp_buf = (char *)__get_free_page(GFP_KERNEL); if (!state->pp_buf) { - kfree(state); + free_partitions(state); return NULL; } state->pp_buf[0] = '\0'; @@ -128,10 +154,9 @@ check_partition(struct gendisk *hd, struct block_device *bdev) if (isdigit(state->name[strlen(state->name)-1])) sprintf(state->name, "p"); - state->limit = disk_max_parts(hd); i = res = err = 0; while (!res && check_part[i]) { - memset(&state->parts, 0, sizeof(state->parts)); + memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. @@ -161,6 +186,6 @@ check_partition(struct gendisk *hd, struct block_device *bdev) printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); - kfree(state); + free_partitions(state); return ERR_PTR(res); } diff --git a/block/partitions/check.h b/block/partitions/check.h index 52b1003..eade17e 100644 --- a/block/partitions/check.h +++ b/block/partitions/check.h @@ -15,13 +15,15 @@ struct parsed_partitions { int flags; bool has_info; struct partition_meta_info info; - } parts[DISK_MAX_PARTS]; + } *parts; int next; int limit; bool access_beyond_eod; char *pp_buf; }; +void free_partitions(struct parsed_partitions *state); + struct parsed_partitions * check_partition(struct gendisk *, struct block_device *); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: Make user-mode helper optional
On Tue, Jan 22, 2013 at 7:51 PM, Takashi Iwai wrote: > Since 3.7 kernel, the firmware loader can read the firmware files > directly, and the traditional user-mode helper is invoked only as a > fallback. This seems working pretty well, and the next step would be > to reduce the redundant user-mode helper stuff in future. The idea is good. > > This patch is a preparation for that, to make the user-mode helper > invocation optional via kconfig. Other than that, there is no > functional change here. In fact, your patch does much code refactoring work, which isn't strictly related with what you claimed in the commit log, at least it should be separated from introducing FW_LOADER_USER_HELPER. So could you split the patch into several parts and let one part do one thing so that we can review them a bit easily? Also it is not good to introduce too many '#ifdef #endif', IMO. > > Signed-off-by: Takashi Iwai > --- > drivers/base/Kconfig | 11 ++ > drivers/base/firmware_class.c | 350 > +- > 2 files changed, 218 insertions(+), 143 deletions(-) > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > index c8b4539..07abd9d 100644 > --- a/drivers/base/Kconfig > +++ b/drivers/base/Kconfig > @@ -145,6 +145,17 @@ config EXTRA_FIRMWARE_DIR > this option you can point it elsewhere, such as /lib/firmware/ or > some other directory containing the firmware files. > > +config FW_LOADER_USER_HELPER > + bool "Fallback user-helper invocation for firmware loading" > + depends on FW_LOADER > + default y > + help > + This option enables / disables the invocation of user-helper > + (e.g. udev) for loading firmware files as a fallback after the > + direct file loading in kernel fails. The user-mode helper is > + no longer required unless you have a special firmware file that > + resides in a non-standard path. > + > config DEBUG_DRIVER > bool "Driver Core verbose debug messages" > depends on DEBUG_KERNEL > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index b392b35..6c6f714 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -88,6 +88,7 @@ enum { > FW_STATUS_ABORT, > }; > > +#ifdef CONFIG_FW_LOADER_USER_HELPER > enum fw_buf_fmt { > VMALLOC_BUF,/* used in direct loading */ > PAGE_BUF, /* used in loading via userspace */ > @@ -99,6 +100,7 @@ static inline long firmware_loading_timeout(void) > { > return loading_timeout > 0 ? loading_timeout * HZ : > MAX_SCHEDULE_TIMEOUT; > } > +#endif /* CONFIG_FW_LOADER_USER_HELPER */ > > struct firmware_cache { > /* firmware_buf instance will be added into the below list */ > @@ -128,12 +130,14 @@ struct firmware_buf { > struct completion completion; > struct firmware_cache *fwc; > unsigned long status; > - enum fw_buf_fmt fmt; > void *data; > size_t size; > +#ifdef CONFIG_FW_LOADER_USER_HELPER > + enum fw_buf_fmt fmt; > struct page **pages; > int nr_pages; > int page_array_size; > +#endif > char fw_id[]; > }; > > @@ -142,6 +146,7 @@ struct fw_cache_entry { > char name[]; > }; > > +#ifdef CONFIG_FW_LOADER_USER_HELPER > struct firmware_priv { > struct delayed_work timeout_work; > bool nowait; > @@ -149,6 +154,7 @@ struct firmware_priv { > struct firmware_buf *buf; > struct firmware *fw; > }; > +#endif > > struct fw_name_devm { > unsigned long magic; > @@ -182,7 +188,9 @@ static struct firmware_buf *__allocate_fw_buf(const char > *fw_name, > strcpy(buf->fw_id, fw_name); > buf->fwc = fwc; > init_completion(&buf->completion); > +#ifdef CONFIG_FW_LOADER_USER_HELPER > buf->fmt = VMALLOC_BUF; > +#endif > > pr_debug("%s: fw-%s buf=%p\n", __func__, fw_name, buf); > > @@ -240,7 +248,6 @@ static void __fw_free_buf(struct kref *ref) > { > struct firmware_buf *buf = to_fwbuf(ref); > struct firmware_cache *fwc = buf->fwc; > - int i; > > pr_debug("%s: fw-%s buf=%p data=%p size=%u\n", > __func__, buf->fw_id, buf, buf->data, > @@ -250,12 +257,15 @@ static void __fw_free_buf(struct kref *ref) > spin_unlock(&fwc->lock); > > > +#ifdef CONFIG_FW_LOADER_USER_HELPER > if (buf->fmt == PAGE_BUF) { > + int i; > vunmap(buf->data); > for (i = 0; i < buf->nr_pages; i++) > __free_page(buf->pages[i]); > kfree(buf->pages); > } else > +#endif > vfree(buf->data); > kfree(buf); > } > @@ -319,7 +329,8 @@ static bool fw_read_file_contents(struct file *file, > struct firmware_buf *fw_buf > return true; > } > > -static bool fw_get_filesystem_firmware(struct firmware_buf *buf) > +static bool
lockdep: access percpu variable too early
Hi, The percpu variables of 'lockdep_stats' and 'cpu_lock_stat' may be accessed before percpu area is brought up in case of CONFIG_DEBUG_LOCKDEP/CONFIG_LOCK_STAT, so these variables in non-boot CPUs will be initialized incorrectly. As far as I think of, there are several solutions for the problem: 1, just enable 'debug_locks' until percpu area is built 2, define the two percpu variables in a simple/stupid percpu way inside lockdep 3, clear these two percpu variables of non-boot CPUs just after percpu area is built. 4, ignore the problem Looks each approache have its own disadvantage, so what is your opinion on the problem? or better approach? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] block: partitions: mac: obey the state->limit constraint
It isn't necessary to read the information of partitions whose No. is equal and more than state->limit since only maximum state->limit partitions will be added inside rescan_partitions(). That is also what other kind of partitions are doing. Signed-off-by: Ming Lei --- block/partitions/mac.c |4 1 file changed, 4 insertions(+) diff --git a/block/partitions/mac.c b/block/partitions/mac.c index 11f688b..76d8ba6 100644 --- a/block/partitions/mac.c +++ b/block/partitions/mac.c @@ -63,6 +63,10 @@ int mac_partition(struct parsed_partitions *state) put_dev_sector(sect); return 0; } + + if (blocks_in_map >= state->limit) + blocks_in_map = state->limit - 1; + strlcat(state->pp_buf, " [mac]", PAGE_SIZE); for (slot = 1; slot <= blocks_in_map; ++slot) { int pos = slot * secsize; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] block: partition: optimize memory allocation in check_partition
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so it is easy to trigger page allocation failure by check_partition, especially in hotplug block device situation(such as, USB mass storage, MMC card, ...), and Felipe Balbi has observed the failure. This patch does below optimizations on the allocation of struct parsed_partitions to try to address the issue: - make parsed_partitions.parts as pointer so that the pointed memory can fit in 32KB buffer, then approximate 32KB memory can be saved - vmalloc the buffer pointed by parsed_partitions.parts because 32KB is still a bit big for kmalloc - given that many devices have the partition count limit, so only allocate disk_max_parts() partitions instead of 256 partitions always Reported-by: Felipe Balbi Signed-off-by: Ming Lei --- block/partition-generic.c |4 ++-- block/partitions/check.c | 37 - block/partitions/check.h |4 +++- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/block/partition-generic.c b/block/partition-generic.c index 1cb4dec..789cdea 100644 --- a/block/partition-generic.c +++ b/block/partition-generic.c @@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev) int p, highest, res; rescan: if (state && !IS_ERR(state)) { - kfree(state); + free_partitions(state); state = NULL; } @@ -525,7 +525,7 @@ rescan: md_autodetect_dev(part_to_dev(part)->devt); #endif } - kfree(state); + free_partitions(state); return 0; } diff --git a/block/partitions/check.c b/block/partitions/check.c index bc90867..19ba207 100644 --- a/block/partitions/check.c +++ b/block/partitions/check.c @@ -14,6 +14,7 @@ */ #include +#include #include #include @@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) = { NULL }; +static struct parsed_partitions *allocate_partitions(struct gendisk *hd) +{ + struct parsed_partitions *state; + int nr; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return NULL; + + nr = disk_max_parts(hd); + state->parts = vzalloc(nr * sizeof(state->parts[0])); + if (!state->parts) { + kfree(state); + return NULL; + } + + state->limit = nr; + + return state; +} + +void free_partitions(struct parsed_partitions *state) +{ + vfree(state->parts); + kfree(state); +} + struct parsed_partitions * check_partition(struct gendisk *hd, struct block_device *bdev) { struct parsed_partitions *state; int i, res, err; - state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL); + state = allocate_partitions(hd); if (!state) return NULL; state->pp_buf = (char *)__get_free_page(GFP_KERNEL); if (!state->pp_buf) { - kfree(state); + free_partitions(state); return NULL; } state->pp_buf[0] = '\0'; @@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device *bdev) if (isdigit(state->name[strlen(state->name)-1])) sprintf(state->name, "p"); - state->limit = disk_max_parts(hd); i = res = err = 0; while (!res && check_part[i]) { - memset(&state->parts, 0, sizeof(state->parts)); + memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. @@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device *bdev) printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); - kfree(state); + free_partitions(state); return ERR_PTR(res); } diff --git a/block/partitions/check.h b/block/partitions/check.h index 52b1003..eade17e 100644 --- a/block/partitions/check.h +++ b/block/partitions/check.h @@ -15,13 +15,15 @@ struct parsed_partitions { int flags; bool has_info; struct partition_meta_info info; - } parts[DISK_MAX_PARTS]; + } *parts; int next; int limit; bool access_beyond_eod; char *pp_buf; }; +void free_partitions(struct parsed_partitions *state); + struct parsed_partitions * check_partition(struct gendisk *, struct block_device *); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one
On Tue, Mar 12, 2013 at 6:56 PM, Russell King - ARM Linux wrote: >> >> Looks no one objects the patch, so I has submitted it into Russell's >> patch system, and hope it can be pushed to linus tree soon and >> make LOCK_STAT/DEBUG_LOCKDEP usable on ARMv7. > > I'm not convinced it is correct. Is the percpu data as stored in the > kernel image (in other words, at offset zero) supposed to be read only? It should have been used after setup_per_cpu_areas(). > If so, the above means that we'll be accessing that rather than the > copy of the percpu data we should be accessing. I admit the patch is a work around for the problem, but it is harmless to make lockdep workable on arm at least. > The percpu data areas are allocated by setup_per_cpu_areas() - that's > where we should be initializing this, just like it's done on PowerPC. >From the entry of start_kernel to setup_per_cpu_areas, there are many locks which will be acquired/released, so the percpu variable in lock_release has to be used early now. Either disabling lockdep during the period or introducing stupid/simple percpu variable inside lockdep may fix the probem, but looks both aren't perfect. So the workaround is proposed in this patch... Ingo and Peter, what is your opinion on the problem? Thanks -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one
On Tue, Mar 12, 2013 at 7:30 PM, Russell King - ARM Linux wrote: >> >> Ingo and Peter, what is your opinion on the problem? > > Having discussed this with Ben Herrenschmidt, it seems that we do need > to have a more complex patch to sort this out - we need to setup our > private pointer inside setup_per_cpu_areas(). Suppose so, seems the patch is still needed to make CPU0 see static variables in '.data..percpu' section correctly. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one
On Wed, Mar 13, 2013 at 1:25 AM, Tejun Heo wrote: > Hello, > > On Tue, Mar 12, 2013 at 07:44:55PM +0800, Ming Lei wrote: >> On Tue, Mar 12, 2013 at 7:30 PM, Russell King - ARM Linux >> wrote: >> >> >> >> Ingo and Peter, what is your opinion on the problem? >> > >> > Having discussed this with Ben Herrenschmidt, it seems that we do need >> > to have a more complex patch to sort this out - we need to setup our >> > private pointer inside setup_per_cpu_areas(). >> >> Suppose so, seems the patch is still needed to make CPU0 see >> static variables in '.data..percpu' section correctly. > > If my memory serves me right, x86 also has places where CPU0 accesses > its per-cpu data in .data.percpu. While those existed (not sure > they're still there but probably they're) mostly due to historical > reasons rather than by design, as long as the data is in consistent > state by and during percpu setup, nothing will break. Tejun, thanks for your input, yes, nothing will break. For lockdep, the percpu variables in non-boot-CPUs may be initialized again after percpu area is set up. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Thu, Mar 7, 2013 at 2:26 PM, Dave Jones wrote: > Could be some of those that caused these bugs. > > I just retried rerunning the test a few times. Every time I run for a while > I end up with different crashes. It's raining bugs over here. > (Here's another sysfs one below) > > Running 'trinity -c mkdirat -V /sys' doesn't seem to trigger it, so it's an > interaction with something else maybe. Maybe it is helpful for trinity to output some previous recorded operations on /sysfs when the crash is triggered. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + atomic-improve-atomic_inc_unless_negative-atomic_dec_unless_positive .patch added to -mm tree
On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov wrote: >> From: Ming Lei >> Subject: atomic: improve >> atomic_inc_unless_negative/atomic_dec_unless_positive >> >> Generally, both atomic_inc_unless_negative() and >> atomic_dec_unless_positive() need at least two atomic_cmpxchg() to >> complete the atomic operation. In fact, the 1st atomic_cmpxchg() is just >> used to read current value of the atomic variable at most times. > > Agreed, this looks ugly... > > But can't we make a simpler patch and keep the code simple/readable ? > > Oleg. > > --- x/include/linux/atomic.h > +++ x/include/linux/atomic.h > @@ -64,7 +64,7 @@ static inline int atomic_inc_not_zero_hi > static inline int atomic_inc_unless_negative(atomic_t *p) > { > int v, v1; > - for (v = 0; v >= 0; v = v1) { > + for (v = atomic_read(p); v >= 0; v = v1) { > v1 = atomic_cmpxchg(p, v, v + 1); Unfortunately, the above will exchange the current value even though it is negative, so it isn't correct. > if (likely(v1 == v)) > return 1; > @@ -77,7 +77,7 @@ static inline int atomic_inc_unless_nega > static inline int atomic_dec_unless_positive(atomic_t *p) > { > int v, v1; > - for (v = 0; v <= 0; v = v1) { > + for (v = atomic_read(p); v <= 0; v = v1) { > v1 = atomic_cmpxchg(p, v, v - 1); Similar with above. > if (likely(v1 == v)) > return 1; > Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
Hi, On Fri, Mar 15, 2013 at 1:04 PM, Sasha Levin wrote: > On 03/15/2013 12:03 AM, Sasha Levin wrote: >> >> [ 350.140100] general protection fault: [#1] PREEMPT SMP >> DEBUG_PAGEALLOC >> [ 350.141468] Dumping ftrace buffer: >> [ 350.142048](ftrace buffer empty) >> [ 350.142619] Modules linked in: >> [ 350.143128] CPU 0 >> [ 350.143434] Pid: 25064, comm: trinity-child14 Tainted: GW >> 3.9.0-rc2-next-20130314-sasha-00046-g3897511 #295 >> [ 350.145415] RIP: 0010:[] [] >> rb_next+0x23/0x60 >> [ 350.146680] RSP: 0018:88007b9dde48 EFLAGS: 00010202 >> [ 350.147528] RAX: 6b6b6b6b6b6b6b6b RBX: 8800b8524b70 RCX: >> 8800b8524b70 >> [ 350.148738] RDX: 6b6b6b6b6b6b6b6b RSI: 8800b63b96e0 RDI: >> 8800b8524bb8 >> [ 350.149939] RBP: 88007b9dde48 R08: R09: >> >> [ 350.150035] R10: R11: R12: >> 88008c5cb180 >> [ 350.150035] R13: R14: R15: >> 0010 >> [ 350.150035] FS: 7fec4eae2700() GS:8800bb80() >> knlGS: >> [ 350.150035] CS: 0010 DS: ES: CR0: 80050033 >> [ 350.150035] CR2: 0001 CR3: 7c32d000 CR4: >> 000406f0 >> [ 350.150035] DR0: DR1: DR2: >> >> [ 350.150035] DR3: DR6: 0ff0 DR7: >> 0400 >> [ 350.150035] Process trinity-child14 (pid: 25064, threadinfo >> 88007b9dc000, task 880096413000) >> [ 350.150035] Stack: >> [ 350.150035] 88007b9ddeb8 812fa959 >> 0008 >> [ 350.150035] 293e 8128cca0 88007b9ddf28 >> 8800b63b96e0 >> [ 350.150035] 8800a14e9b78 88008c5cb180 88007b9ddf28 >> 8128cca0 >> [ 350.150035] Call Trace: >> [ 350.150035] [] sysfs_readdir+0x219/0x280 >> [ 350.150035] [] ? filldir+0x100/0x100 >> [ 350.150035] [] ? filldir+0x100/0x100 >> [ 350.150035] [] vfs_readdir+0x78/0xc0 >> [ 350.150035] [] ? trace_hardirqs_on+0xd/0x10 >> [ 350.150035] [] SyS_getdents64+0x90/0x120 >> [ 350.150035] [] tracesys+0xe1/0xe6 >> [ 350.150035] Code: 85 d2 75 f4 5d c3 66 90 55 31 c0 48 8b 17 48 89 e5 48 >> 39 d7 74 4a 48 8b 47 08 48 85 c0 75 0c eb 17 0f 1f 80 >> 00 00 00 00 48 89 d0 <48> 8b 50 10 48 85 d2 75 f4 eb 2a 66 90 48 89 d1 48 83 >> e1 fc 74 >> [ 350.150035] RIP [] rb_next+0x23/0x60 >> [ 350.150035] RSP >> [ 350.179705] ---[ end trace a39f58a515b594d5 ]--- > > And on the bright side, unlike in Dave's case, similar issues reproduce > rather easily > over here: Could you share how to reproduce that easily? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + atomic-improve-atomic_inc_unless_negative-atomic_dec_unless_positive .patch added to -mm tree
On Fri, Mar 15, 2013 at 9:46 PM, Oleg Nesterov wrote: > On 03/15, Ming Lei wrote: >> >> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov wrote: >> > static inline int atomic_inc_unless_negative(atomic_t *p) >> > { >> > int v, v1; >> > - for (v = 0; v >= 0; v = v1) { >> > + for (v = atomic_read(p); v >= 0; v = v1) { >> > v1 = atomic_cmpxchg(p, v, v + 1); >> >> Unfortunately, the above will exchange the current value even though >> it is negative, so it isn't correct. > > Hmm, why? We always check "v >= 0" before we try to do > atomic_cmpxchg(old => v) ? Sorry, yes, you are right. But then your patch is basically same with the previous one, isn't it? And has same problem, see below discussion: http://marc.info/?t=13628436691&r=1&w=2 Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Sat, Mar 16, 2013 at 8:39 PM, Hillf Danton wrote: > init rb node before use due to empty node checked by rb_next(). > > --- a/fs/sysfs/dir.cSat Mar 16 20:12:16 2013 > +++ b/fs/sysfs/dir.cSat Mar 16 20:37:10 2013 > @@ -396,6 +396,7 @@ struct sysfs_dirent *sysfs_new_dirent(co > > atomic_set(&sd->s_count, 1); > atomic_set(&sd->s_active, 0); > + rb_init_node(&sd->s_rb); Looks there is no the symbol in linus tree, and not necessary since rb_link_node() may initialize the node correctly. Also I can't reproduce the problem on my Pandaboard with trinity, maybe the below debug code can find the name of the affected node if anyone would like to test it. -- diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 2fbdff6..9aa11c7 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -281,6 +281,10 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) */ parent_sd = sd->s_parent; + if(!(sd->s_flags & SYSFS_FLAG_REMOVED)) + printk("%s sysfs_dirent use after free: %s-%s\n", + __func__, parent_sd->s_name, sd->s_name); + if (sysfs_type(sd) == SYSFS_KOBJ_LINK) sysfs_put(sd->s_symlink.target_sd); if (sysfs_type(sd) & SYSFS_COPY_NAME) @@ -962,6 +966,9 @@ static struct sysfs_dirent *sysfs_dir_pos(const void *ns, int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) && pos->s_parent == parent_sd && hash == pos->s_hash; + if (valid && (atomic_read(&pos->s_count) <= 1)) + printk("%s sysfs_dirent use after free: %s-%s\n", + __func__, parent_sd->s_name, pos->s_name); sysfs_put(pos); if (!valid) pos = NULL; Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Sat, Mar 16, 2013 at 11:07 PM, Sasha Levin wrote: > > Hi Ming, > > With your patch: > > > [ 1525.874312] release_sysfs_dirent sysfs_dirent use after free: ptysb-uevent Sasha, thanks for your test. So is the oops always triggered on this node of 'ptysb-uevent' or the node name is changed every time? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm/fremap.c: fix another oops on error path
Since find_vma() may return NULL, so don't dereference the returned 'vma' until it is valid. The problem is introduced by the commit in linus tree: 6d7825b(mm/fremap.c: fix oops on error path). Also mark vm_flags as ninitialized_var() to avoid compile warning. Cc: Tommi Rantala Cc: Michel Lespinasse Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Ming Lei --- mm/fremap.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/fremap.c b/mm/fremap.c index 6a8da7e..80088e9 100644 --- a/mm/fremap.c +++ b/mm/fremap.c @@ -129,7 +129,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, struct vm_area_struct *vma; int err = -EINVAL; int has_write_lock = 0; - vm_flags_t vm_flags; + vm_flags_t uninitialized_var(vm_flags); if (prot) return err; @@ -163,8 +163,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, * and that the remapped range is valid and fully within * the single existing vma. */ - vm_flags = vma->vm_flags; - if (!vma || !(vm_flags & VM_SHARED)) + if (!vma || !(vma->vm_flags & VM_SHARED)) goto out; if (!vma->vm_ops || !vma->vm_ops->remap_pages) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Sat, Mar 16, 2013 at 11:22 PM, Ming Lei wrote: > On Sat, Mar 16, 2013 at 11:07 PM, Sasha Levin wrote: >> >> Hi Ming, >> >> With your patch: >> >> >> [ 1525.874312] release_sysfs_dirent sysfs_dirent use after free: ptysb-uevent > > Sasha, thanks for your test. > > So is the oops always triggered on this node of 'ptysb-uevent' or the node > name > is changed every time? Also, we may dump stack on the release path with the below patch to see who did the ugly free. -- diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 2fbdff6..993671d 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -281,6 +281,12 @@ void release_sysfs_dirent(struct sysfs_dirent * sd) */ parent_sd = sd->s_parent; + if(!(sd->s_flags & SYSFS_FLAG_REMOVED)) { + printk("%s sysfs_dirent use after free: %s-%s\n", + __func__, parent_sd->s_name, sd->s_name); + dump_stack(); + } + if (sysfs_type(sd) == SYSFS_KOBJ_LINK) sysfs_put(sd->s_symlink.target_sd); if (sysfs_type(sd) & SYSFS_COPY_NAME) @@ -962,6 +968,9 @@ static struct sysfs_dirent *sysfs_dir_pos(const void *ns, int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) && pos->s_parent == parent_sd && hash == pos->s_hash; + if (valid && (atomic_read(&pos->s_count) <= 1)) + printk("%s sysfs_dirent use after free: %s-%s\n", + __func__, parent_sd->s_name, pos->s_name); sysfs_put(pos); if (!valid) pos = NULL; Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Sun, Mar 17, 2013 at 2:33 AM, Sasha Levin wrote: > > I don't think it shows what we want it to show thought: > > [ 327.416905] Pid: 10504, comm: trinity-child98 Tainted: GW > 3.9.0-rc2-next-20130315-sasha-00046-gecde602-dirty #301 > [ 327.418815] Call Trace: > [ 327.419255] [] release_sysfs_dirent+0x4e/0x120 > [ 327.420595] [] sysfs_dir_pos+0x92/0x130 > [ 327.421608] [] sysfs_readdir+0x11d/0x280 > [ 327.422562] [] ? SyS_ioctl+0xa0/0xa0 > [ 327.423441] [] ? SyS_ioctl+0xa0/0xa0 > [ 327.424314] [] vfs_readdir+0x78/0xc0 > [ 327.425263] [] SyS_getdents+0x8c/0x110 > [ 327.426173] [] tracesys+0xe1/0xe6 > Sasha, looks there is a race when sys_readdir() is run concurrently on same directory, and the below patch may fix the race, could you test the attachment patch to see if the use after free can be fixed? Thanks, -- Ming Lei sysfs-fix-readdir.patch Description: Binary data
Re: [PATCH] mm/fremap.c: fix another oops on error path
On Sun, Mar 17, 2013 at 12:44 PM, Michel Lespinasse wrote: > On Sat, Mar 16, 2013 at 8:23 AM, Ming Lei wrote: >> Since find_vma() may return NULL, so don't dereference the >> returned 'vma' until it is valid. > > Agree this was an issue. This is fixed with commit a2362d24764a. It is fine if the issue has been fixed, but I didn't see the commit a2362d or other fix on both linus and next tree. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: use after free in sysfs_find_dirent
On Sun, Mar 17, 2013 at 10:24 PM, Sasha Levin wrote: > > I still see it going on with the patch applied: Looks the previous patch still has the race problem, so could you just apply the attachment patch and cancel all previous patches for the test? If there is still the problem, please post out the log. BTW, the attachment patch is only for verifying if the current problem is caused by 'filp->private_data' race, and not for merge. Thanks, -- Ming Lei sysfs-fix-readdir-v1.patch Description: Binary data
Re: A workaround for request_firmware() stuck in module_init
On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai wrote: > Hi, > > as I've got recently a few bug reports regarding the stuck with > request_firmware() in module_init of some sound drivers, I started > looking at the issue. Strangely, the problem doesn't happen on > openSUSE 12.2 although it has the same udev version with libkmod as > Fedora. So I installed Fedora 17, and indeed I could see a problem > there. It should be a bug in udev, as discussed in the below link: http://marc.info/?t=13455274512&r=1&w=2 > > Obviously a solution would be to rewrite the driver code to use > request_firmware_nowait() instead. But it'd need a lot of code > shuffling, and most of such drivers are old stuff I don't want to do a > serious surgery. > > So I tried an easier workaround by using the deferred probing. > An experimental patch is below. As you can see, from the driver side, > it's simple: just add two lines at the head of each probe function. In fact, the defer probe should only be applied in situations which driver is built in kernel and request_firmware is called in .probe(). > > Do you think this kind of hack is OK? If not, any better (IOW easier) > solution? Looks your solution is a bit complicated, and I have wrote a similar patch to address the problem, but Linus thought that it may hide the problem of drivers. http://marc.info/?t=13427879084&r=1&w=2 IMO, driver core needn't to be changed, and the defer probe can be supported just by changes in request_firmware() and its caller. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A workaround for request_firmware() stuck in module_init
On Wed, Sep 5, 2012 at 12:10 AM, Takashi Iwai wrote: > At Tue, 4 Sep 2012 23:52:15 +0800, > Ming Lei wrote: >> >> On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai wrote: >> > Hi, >> > >> > as I've got recently a few bug reports regarding the stuck with >> > request_firmware() in module_init of some sound drivers, I started >> > looking at the issue. Strangely, the problem doesn't happen on >> > openSUSE 12.2 although it has the same udev version with libkmod as >> > Fedora. So I installed Fedora 17, and indeed I could see a problem >> > there. >> >> It should be a bug in udev, as discussed in the below link: >> >> http://marc.info/?t=13455274512&r=1&w=2 > > Yeah, if udev has a fix, I'm fine. I'll proactively ignore these bug > reports. But did it really happen...? Linus has expressed that it should be fixed by udev, maybe we can wait some time to see if it will happen, :-) There are more than 300 request_firmware called in probe(), even adding 2 line code in these drivers will involve much workload, since you need to find the matched probe() for one request_firmware and sometimes it is not easy. > >> > Obviously a solution would be to rewrite the driver code to use >> > request_firmware_nowait() instead. But it'd need a lot of code >> > shuffling, and most of such drivers are old stuff I don't want to do a >> > serious surgery. >> > >> > So I tried an easier workaround by using the deferred probing. >> > An experimental patch is below. As you can see, from the driver side, >> > it's simple: just add two lines at the head of each probe function. >> >> In fact, the defer probe should only be applied in situations which >> driver is built in kernel and request_firmware is called in .probe(). > > Is it? I thought the deferred probe is basically not for this problem > but rather for the dependency problem with other modules. That's the > reason it's triggered only upon the successful binding. Sorry, could you explain the dependency in a bit detail? >From your patch, I understand you just convert the caller of request_firmware from module_init into deferred_probe_work_func(), so the udev problem is avoided, isn't it? Looks making all .probe() run non-module_init context is still a solution, :-) > > And IMO, the deferred probe for the built-in kernel is also silly. > Did anyone really make it working for built-in kernel driver and > external firmware files? (For the resume, it's of course a different Yes, my original patch does work for the built-in situations. > issue. And I guess it's been solved by your fw cache patch, right?) > >> > Do you think this kind of hack is OK? If not, any better (IOW easier) >> > solution? >> >> Looks your solution is a bit complicated, and I have wrote a similar >> patch to address the problem, but Linus thought that it may hide the >> problem of drivers. >> >> http://marc.info/?t=13427879084&r=1&w=2 >> >> IMO, driver core needn't to be changed, and the defer probe can be >> supported just by changes in request_firmware() and its caller. > > Ideally, yes. But it won't work in some drivers like sound drivers, > that have its own device number counting changed at each probe call. > For such drivers, the deferred probe check must be done before > entering the normal probe procedure, and returning -EPROBE_DEFER would > be too late (or need more complex fallbacks). Simply, you can put the firmware loading at the start of probe to avoid the specific sound problem, :-) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A workaround for request_firmware() stuck in module_init
On Wed, Sep 5, 2012 at 1:53 PM, Takashi Iwai wrote: > At Wed, 5 Sep 2012 09:15:34 +0800, > Ming Lei wrote: >> >> On Wed, Sep 5, 2012 at 12:10 AM, Takashi Iwai wrote: >> > At Tue, 4 Sep 2012 23:52:15 +0800, >> > Ming Lei wrote: >> >> >> >> On Tue, Sep 4, 2012 at 9:06 PM, Takashi Iwai wrote: >> >> > Hi, >> >> > >> >> > as I've got recently a few bug reports regarding the stuck with >> >> > request_firmware() in module_init of some sound drivers, I started >> >> > looking at the issue. Strangely, the problem doesn't happen on >> >> > openSUSE 12.2 although it has the same udev version with libkmod as >> >> > Fedora. So I installed Fedora 17, and indeed I could see a problem >> >> > there. >> >> >> >> It should be a bug in udev, as discussed in the below link: >> >> >> >> http://marc.info/?t=13455274512&r=1&w=2 >> > >> > Yeah, if udev has a fix, I'm fine. I'll proactively ignore these bug >> > reports. But did it really happen...? >> >> Linus has expressed that it should be fixed by udev, maybe we can >> wait some time to see if it will happen, :-) > > Kay, what is the status? > >> There are more than 300 request_firmware called in probe(), even >> adding 2 line code in these drivers will involve much workload, since >> you need to find the matched probe() for one request_firmware and >> sometimes it is not easy. > > Well, this depends on from which perspective you look at the issue. > See below. > >> >> > Obviously a solution would be to rewrite the driver code to use >> >> > request_firmware_nowait() instead. But it'd need a lot of code >> >> > shuffling, and most of such drivers are old stuff I don't want to do a >> >> > serious surgery. >> >> > >> >> > So I tried an easier workaround by using the deferred probing. >> >> > An experimental patch is below. As you can see, from the driver side, >> >> > it's simple: just add two lines at the head of each probe function. >> >> >> >> In fact, the defer probe should only be applied in situations which >> >> driver is built in kernel and request_firmware is called in .probe(). >> > >> > Is it? I thought the deferred probe is basically not for this problem >> > but rather for the dependency problem with other modules. That's the >> > reason it's triggered only upon the successful binding. >> >> Sorry, could you explain the dependency in a bit detail? > > When a device has some implicit dependency on another hardware > component (e.g. SDHCI depends on I2C GPIO controller, as in comment in > drivers/base/dd.c), the driver needs to wait until another one gets > ready. -EPROBE_DEFER was introduced for such a purpose. > > So, using this mechanism for the firmware issue is a kind of abuse. > >> >From your patch, I understand you just convert the caller of >> request_firmware from module_init into deferred_probe_work_func(), >> so the udev problem is avoided, isn't it? > > Yes, that was a hack. The idea is just offloading the probe, and the > deferred probe is an existing simple way for that. > >> Looks making all .probe() run non-module_init context is still a solution, >> :-) > > Sounds interesting. > >> > And IMO, the deferred probe for the built-in kernel is also silly. >> > Did anyone really make it working for built-in kernel driver and >> > external firmware files? (For the resume, it's of course a different >> >> Yes, my original patch does work for the built-in situations. >> >> > issue. And I guess it's been solved by your fw cache patch, right?) >> > >> >> > Do you think this kind of hack is OK? If not, any better (IOW easier) >> >> > solution? >> >> >> >> Looks your solution is a bit complicated, and I have wrote a similar >> >> patch to address the problem, but Linus thought that it may hide the >> >> problem of drivers. >> >> >> >> http://marc.info/?t=13427879084&r=1&w=2 >> >> >> >> IMO, driver core needn't to be changed, and the defer probe can be >> >> supported just by changes in request_firmware() and its caller. >> > >> > Ideally, yes. But it won't work in some drivers like sound drivers, >> > that have its
Question on irq autoprobe
Hi, I see the below comments on probe_irq_off: * BUGS: When used in a module (which arguably shouldn't happen) * nothing prevents two IRQ probe callers from overlapping. The * results of this are non-optimal. */ But from the code of probe_irq_on and probe_irq_off, the mutex of probing_active is held during the whole irq probe procedure, so I don't understand why the above said that 'nothing prevents two IRQ probe callers from overlapping", and why isn't the mutex of probing_active enough to avoid the overlapping? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] firmware loader: cancel uncache work before caching firmware
Under 'Opportunistic sleep' situation, system sleep might be triggered very frequently, so the uncahce work may not be completed before caching firmware during next suspend. This patch cancels the uncache work before caching firmware to fix the problem above. Also this patch optimizes the cacheing firmware mechanism a bit by only storing one firmware cache entry for one firmware image. So if the firmware is still cached during suspend, it doesn't need to be loaded from user space any more. Signed-off-by: Ming Lei --- drivers/base/firmware_class.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 6e21080..ddc17bc 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -1068,17 +1068,27 @@ exit: return fce; } -static int fw_cache_piggyback_on_request(const char *name) +static int __fw_entry_found(const char *name) { struct firmware_cache *fwc = &fw_cache; struct fw_cache_entry *fce; - int ret = 0; - spin_lock(&fwc->name_lock); list_for_each_entry(fce, &fwc->fw_names, list) { if (!strcmp(fce->name, name)) - goto found; + return 1; } + return 0; +} + +static int fw_cache_piggyback_on_request(const char *name) +{ + struct firmware_cache *fwc = &fw_cache; + struct fw_cache_entry *fce; + int ret = 0; + + spin_lock(&fwc->name_lock); + if (__fw_entry_found(name)) + goto found; fce = alloc_fw_cache_entry(name); if (fce) { @@ -1155,8 +1165,13 @@ static void dev_cache_fw_image(struct device *dev, void *data) list_del(&fce->list); spin_lock(&fwc->name_lock); - fwc->cnt++; - list_add(&fce->list, &fwc->fw_names); + /* only one cache entry for one firmware */ + if (!__fw_entry_found(fce->name)) { + fwc->cnt++; + list_add(&fce->list, &fwc->fw_names); + } else { + free_fw_cache_entry(fce); + } spin_unlock(&fwc->name_lock); async_schedule(__async_dev_cache_fw_image, (void *)fce); @@ -1201,6 +1216,9 @@ static void device_cache_fw_images(void) pr_debug("%s\n", __func__); + /* cancel uncache work */ + cancel_delayed_work_sync(&fwc->work); + /* * use small loading timeout for caching devices' firmware * because all these firmware images have been loaded -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware loader: cancel uncache work before caching firmware
On Tue, Oct 2, 2012 at 4:35 PM, Ming Lei wrote: > --- > drivers/base/firmware_class.c | 30 -- > 1 file changed, 24 insertions(+), 6 deletions(-) Please ignore the buggy patch, sorry for the noise. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
On Tue, Oct 23, 2012 at 3:18 AM, Alan Stern wrote: > On Mon, 22 Oct 2012, Ming Lei wrote: > Is this really needed? Even with iSCSI, doesn't register_disk() have > to be called for the underlying block device? And given your 3/6 > patch, wouldn't that mark the network device? The problem is that network device is not one ancestor of the iSCSI disk device, which transfers data over tcp stack. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2 6/6] USB: forbid memory allocation with I/O during bus reset
On Mon, Oct 22, 2012 at 10:37 PM, Alan Stern wrote: > On Mon, 22 Oct 2012, Ming Lei wrote: >> >> + /* >> + * Don't allocate memory with GFP_KERNEL in current >> + * context to avoid possible deadlock if usb mass >> + * storage interface or usbnet interface(iSCSI case) >> + * is included in current configuration. The easiest >> + * approach is to do it for all devices. >> + */ >> + memalloc_noio_save(noio_flag); > > Why not check dev->power.memalloc_noio_resume here too? Yes, we can use the flag here too even though it is introduced for rutime_resume case. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Mon, Oct 22, 2012 at 10:33 PM, Alan Stern wrote: > > Tail recursion should be implemented as a loop, not as an explicit > recursion. That is, the function should be: > > void pm_runtime_set_memalloc_noio(struct device *dev, bool enable) > { > do { > dev->power.memalloc_noio_resume = enable; > > if (!enable) { > /* > * Don't clear the parent's flag if any of the > * parent's children have their flag set. > */ > if (device_for_each_child(dev->parent, NULL, > dev_memalloc_noio)) > return; > } > dev = dev->parent; > } while (dev); > } OK, will take the non-recursion implementation for saving kernel stack space. > > except that you need to add locking, for two reasons: > > There's a race. What happens if another child sets the flag > between the time device_for_each_child() runs and the next loop > iteration? Yes, I know the race, and not adding a lock because the function is mostly called in .probe() or .remove() callback and its parent's device lock is held to avoid this race. Considered that it may be called in async probe() (scsi disk), one lock is needed, the simplest way is to add a global lock. Any suggestion? > > Even without a race, access to bitfields is not SMP-safe > without locking. You mean one ancestor device might not be in active when one of its descendants is being probed or removed? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list
On Tue, Oct 23, 2012 at 8:52 PM, Dimitris Papastamos wrote: > In preparation to support dynamic listing/updating of firmware > paths via procfs, this patch converts the firmware path configuration > from an array to a list. I remembered that I have questioned on dynamic listing/updating of firmware via procfs, looks you don't reply, :-( http://marc.info/?l=linux-kernel&m=134988208617179&w=2 IMO, it is not a good idea to update firmware path dynamically via proc because it may be too late to update a dynamic search path via /proc and drivers need request firmware just after rootfs is mounted. kernel parameter should be a good way to pass one customerized path. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list
On Tue, Oct 23, 2012 at 9:29 PM, Dimitris Papastamos wrote: > > Hi sorry for being unclear, I mentioned in the old thread that > I've not had time to fix this properly (or think about a proper > fix), Greg requested a rebase on top of linux-next, however, at > least for firmware path listing. > > Should I amend the commit description so it is not misleading and > remove the 'updating' part? In fact, I also questioned on path listing too in last thread too. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -next] printk: fix broken 'console' kernel parameter
If CONFIG_A11Y_BRAILLE_CONSOLE is not enabled, _braille_console_setup() should return NULL to parse the parameter further in console_setup(). This patch fixes the broken 'console' kernel parameter, which makes the Pandaboard not boot with 'console=ttyO2'. Cc: Andrew Morton Cc: Samuel Thibault Cc: Joe Perches Signed-off-by: Ming Lei --- kernel/printk/braille.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/printk/braille.h b/kernel/printk/braille.h index d2e6bc3..769d771 100644 --- a/kernel/printk/braille.h +++ b/kernel/printk/braille.h @@ -28,7 +28,7 @@ braille_set_options(struct console_cmdline *c, char *brl_options) static inline char * _braille_console_setup(char **str, char **brl_options) { - return *str; + return NULL; } static inline int -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 v2] firmware: Convert firmware path setup from an array to a list
On Tue, Oct 23, 2012 at 9:44 PM, Dimitris Papastamos wrote: > > I don't currently have a use-case for this, so not sure how useful > it is to list the default fw paths. OK, got it, thanks for your effort. Greg, so could we just hold on until it is useful? Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
On Tue, Oct 23, 2012 at 10:46 PM, Alan Stern wrote: > On Tue, 23 Oct 2012, Ming Lei wrote: > >> On Mon, Oct 22, 2012 at 10:33 PM, Alan Stern >> wrote: >> > >> > Tail recursion should be implemented as a loop, not as an explicit >> > recursion. That is, the function should be: >> > >> > void pm_runtime_set_memalloc_noio(struct device *dev, bool enable) >> > { >> > do { >> > dev->power.memalloc_noio_resume = enable; >> > >> > if (!enable) { >> > /* >> > * Don't clear the parent's flag if any of the >> > * parent's children have their flag set. >> > */ >> > if (device_for_each_child(dev->parent, NULL, >> > dev_memalloc_noio)) >> > return; >> > } >> > dev = dev->parent; >> > } while (dev); >> > } >> >> OK, will take the non-recursion implementation for saving kernel >> stack space. >> >> > >> > except that you need to add locking, for two reasons: >> > >> > There's a race. What happens if another child sets the flag >> > between the time device_for_each_child() runs and the next loop >> > iteration? >> >> Yes, I know the race, and not adding a lock because the function >> is mostly called in .probe() or .remove() callback and its parent's device >> lock is held to avoid this race. >> >> Considered that it may be called in async probe() (scsi disk), one lock >> is needed, the simplest way is to add a global lock. Any suggestion? > > No. Because of where you put the new flag, it must be protected by > dev->power.lock. And this means the iterative implementation shown > above can't be used as is. It will have to be more like this: > > void pm_runtime_set_memalloc_noio(struct device *dev, bool enable) > { > spin_lock_irq(&dev->power.lock); > dev->power.memalloc_noio_resume = enable; > > while (dev->parent) { > spin_unlock_irq(&dev->power.lock); > dev = dev->parent; > > spin_lock_irq(&dev->power.lock); > /* > * Don't clear the parent's flag if any of the > * parent's children have their flag set. > */ > if (!enable && device_for_each_child(dev->parent, NULL, s/dev->parent/dev > dev_memalloc_noio)) > break; > dev->power.memalloc_noio_resume = enable; > } > spin_unlock_irq(&dev->power.lock); > } With the problem of non-SMP-safe bitfields access, the power.lock should be held, but that is not enough to prevent children from being probed or disconnected. Looks another lock is still needed. I think a global lock is OK in the infrequent path. > >> > Even without a race, access to bitfields is not SMP-safe >> > without locking. >> >> You mean one ancestor device might not be in active when >> one of its descendants is being probed or removed? > > No. Consider this example: > > struct foo { > int a:1; > int b:1; > } x; > > Consider what happens if CPU 0 does "x.a = 1" at the same time as > another CPU 1 does "x.b = 1". The compiler might produce object code > looking like this for CPU 0: > > movex, reg1 > or 0x1, reg1 > movereg1, x > > and this for CPU 1: > > movex, reg2 > or 0x2, reg2 > movereg2, x > > With no locking, the two "or" instructions could execute > simultaneously. What will the final value of x be? > > The two CPUs will interfere, even though they are touching different > bitfields. Got it, thanks for your detailed explanation. Looks the problem is worse than above, not only bitfields are affected, the adjacent fields might be involved too, see: http://lwn.net/Articles/478657/ Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] firmware loader: document kernel direct loading
This patch adds description on recently introduced direct firmware loading by Linus. Cc: Linus Torvalds Signed-off-by: Ming Lei --- Documentation/firmware_class/README | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 7eceaff..815b711 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -18,32 +18,40 @@ High level behavior (mixed): - kernel(driver): calls request_firmware(&fw_entry, $FIRMWARE, device) - - userspace: + 1), kernel(driver): + - calls request_firmware(&fw_entry, $FIRMWARE, device) + - kernel searchs the fimware image with name $FIRMWARE directly + in the below search path of root filesystem: + "/lib/firmware/updates/" UTS_RELEASE, + "/lib/firmware/updates", + "/lib/firmware/" UTS_RELEASE, + "/lib/firmware" + - If found, goto 7), else goto 2) + + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE and the usual hotplug environment. - hotplug: echo 1 > /sys/class/firmware/xxx/loading - kernel: Discard any previous partial load. + 3), kernel: Discard any previous partial load. - userspace: + 4), userspace: - hotplug: cat appropriate_firmware_image > \ /sys/class/firmware/xxx/data - kernel: grows a buffer in PAGE_SIZE increments to hold the image as it + 5), kernel: grows a buffer in PAGE_SIZE increments to hold the image as it comes in. - userspace: + 6), userspace: - hotplug: echo 0 > /sys/class/firmware/xxx/loading - kernel: request_firmware() returns and the driver has the firmware + 7), kernel: request_firmware() returns and the driver has the firmware image in fw_entry->{data,size}. If something went wrong request_firmware() returns non-zero and fw_entry is set to NULL. - kernel(driver): Driver code calls release_firmware(fw_entry) releasing + 8), kernel(driver): Driver code calls release_firmware(fw_entry) releasing the firmware image and any related resource. High level behavior (driver code): -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path
This patch introduces one kernel parameter of 'fw_path' to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1]. [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- Documentation/firmware_class/README |1 + Documentation/kernel-parameters.txt |9 + drivers/base/firmware_class.c | 24 ++-- 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..af63ad3 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,6 +22,7 @@ - calls request_firmware(&fw_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by kernel parameter of 'fw_path' "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9776f06..d1125c5 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -52,6 +52,7 @@ parameter is applicable: EVM Extended Verification Module FB The frame buffer device is enabled. FTRACE Function tracing enabled. + FW The firmware loader is enabled. GCOVGCOV profiling is enabled. HW Appropriate hardware is enabled. IA-64 IA-64 architecture is enabled. @@ -843,6 +844,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Format: ,,, See also Documentation/fault-injection/. + fw_path=[FW] + Customized firmware image search path for kernel + direct loading, which has higher priority than + built-in default search path. + Format: , length < 256 + direcory path, such as /lib/firmware. + See Documentation/firmware_class/README. + floppy= [HW] See Documentation/blockdev/floppy.txt. diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 8945f4e..e7db931 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -274,6 +274,17 @@ static const char *fw_path[] = { "/lib/firmware" }; +static char fw_path_para[256]; +static int __init customized_path(char *str) +{ + if (strlen(str) < 256) + strlcpy(fw_path_para, str, sizeof(fw_path_para)); + return 1; +} +__setup("fw_path=", customized_path); +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline long fw_file_size(struct file *file) { @@ -313,9 +324,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) bool success = false; char *path = __getname(); - for (i = 0; i < ARRAY_SIZE(fw_path); i++) { + for (i = -1; i < ARRAY_SIZE(fw_path); i++) { struct file *file; - snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); + + if (i < 0) { + if (!fw_path_para[0]) /* No customized path */ + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path_para, +buf->fw_id); + } else { + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], +buf->fw_id); + } file = filp_open(path, O_RDONLY, 0); if (IS_ERR(file)) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path
On Wed, Oct 24, 2012 at 11:18 AM, Greg Kroah-Hartman wrote: > > Ick, no, not another kernel boot parameter. I would prefer the /proc/ > file solution instead, just like /proc/sys/kernel/hotplug is. OK, got it, but we could use the module parameter(firmware_class.path via kernel command) to customize the search path too, couldn't we? > > But, that really wouldn't work for boot time, so, again just like > hotplug works, make it also a Kconfig option for those distros that > build their own kernels and keep their firmware in odd places, or don't > have a /lib/ directory, can override it easily. Sound like still a solution, :-) If you agree on module parameter, that might be more flexible, IMO Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] firmware loader: introduce kernel parameter to customize fw search path
On Wed, Oct 24, 2012 at 1:42 PM, Greg Kroah-Hartman wrote: > Yes, but why? Let's not overdesign this, get the basics working and > then, if someone really needs it, we can add it. OK, I will remove the kernel parameter and related documentation, and just keep it as module parameter, sorry about that. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Q] "console=" command line ignored?
On Wed, Oct 24, 2012 at 6:49 PM, Guennadi Liakhovetski wrote: > Hi > > I cannot seem to force selection of a serial console by specifying > "console=ttySC0..." on the command line with modern (yesterday's "next") > kernels. The kernel insists on > > console [tty0] enabled Please see https://lkml.org/lkml/2012/10/23/307 > > The reason might also well be, that the respective "sh-sci" driver is > only registered later. I'll try to look at this in more detail - since > when this is broken and how the registration order looked before. This is > just a first heads-up in case someone has a solution immediately at hand. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] firmware loader: introduce module parameter to customize fw search path
This patch introduces one module parameter of 'path' in firmware_class to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1], and the typical usage is passing the below from kernel command parameter when 'firmware_class' is built in kernel: firmware_class.path=$CUSTOMIZED_PATH [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- V1: - remove kernel boot parameter and only support the feature by module parameter as suggested by Greg Documentation/firmware_class/README |5 + drivers/base/firmware_class.c | 23 +-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..ce02744 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,12 +22,17 @@ - calls request_firmware(&fw_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by module parameter 'path'[1] "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" - If found, goto 7), else goto 2) + [1], the 'path' is a string parameter which length should be less + than 256, user should pass 'firmware.path=$CUSTOMIZED_PATH' if + firmware_class is built in kernel(the general situation) + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 8945f4e..b363103 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -274,6 +274,16 @@ static const char *fw_path[] = { "/lib/firmware" }; +static char fw_path_para[256]; + +/* + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' + * from kernel command because firmware_class is generally built in + * kernel instead of module. + */ +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline long fw_file_size(struct file *file) { @@ -313,9 +323,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) bool success = false; char *path = __getname(); - for (i = 0; i < ARRAY_SIZE(fw_path); i++) { + for (i = -1; i < ARRAY_SIZE(fw_path); i++) { struct file *file; - snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); + + if (i < 0) { + if (!fw_path_para[0]) /* No customized path */ + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path_para, +buf->fw_id); + } else { + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], +buf->fw_id); + } file = filp_open(path, O_RDONLY, 0); if (IS_ERR(file)) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path
On Fri, Oct 26, 2012 at 10:32 AM, Linus Torvalds wrote: > > Please just make "fw_path[0]" just be the pointer to fw_path_para[] > (which sounds like the cleanest fix) and get rid of the negative 'i' > and conditional entirely. Yes, it should be the cleanest, I don't do it because I thought that might have caused one compile warning('const char *' points to memory without 'const', like below) static char fw_path_para[256]; static const char *fw_path[] = { fw_path_para, "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" }; but in fact there isn't any warning with above change and it does work, still don't know why? :-( Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()
On Wed, Oct 31, 2012 at 11:41 PM, Alan Stern wrote: > > Sorry, I misread your message. You are setting the device's flag, not > the thread's flag. Never mind. > > This still doesn't help in this case where CONFIG_PM_RUNTIME is > disabled. I think it will be simpler to set the noio flag during every > device reset. Yes, it's better to set the flag during every device reset now. Also pppoe or network interface over serial port is a bit difficult to deal with, as Oliver pointed out. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 0/6] solve deadlock caused by memory allocation with I/O
This patchset try to solve one deadlock problem which might be caused by memory allocation with block I/O during runtime PM and block device error handling path. Traditionly, the problem is addressed by passing GFP_NOIO statically to mm, but that is not a effective solution, see detailed description in patch 1's commit log. This patch set introduces one process flag and trys to fix the deadlock problem on block device/network device during runtime PM or usb bus reset. The 1st one is the change on include/sched.h and mm. The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info', and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not allocate mm with GFP_IOFS during the runtime_resume callback only on device with the flag set. The following 2 patches apply the introduced pm_runtime_set_memalloc_noio() to mark all devices as memalloc_noio_resume in the path from the block or network device to the root device in device tree. The last 2 patches are applied again PM and USB subsystem to demonstrate how to use the introduced mechanism to fix the deadlock problem. Change logs: V4: - patches from the 2nd to the 6th changed - call pm_runtime_set_memalloc_noio() after device_add() as pointed by Alan - set PF_MEMALLOC_NOIO during runtime_suspend() V3: - patch 2/6 and 5/6 changed, see their commit log - remove RFC from title since several guys have expressed that it is a reasonable solution V2: - remove changes on 'may_writepage' and 'may_swap'(1/6) - unset GFP_IOFS in try_to_free_pages() path(1/6) - introduce pm_runtime_set_memalloc_noio() - only apply the meachnism on block/network device and its ancestors for runtime resume context V1: - take Minchan's change to avoid the check in alloc_page hot path - change the helpers' style into save/restore as suggested by Alan - memory allocation with no io in usb bus reset path for all devices as suggested by Greg and Oliver block/genhd.c|9 + drivers/base/power/runtime.c | 89 +- drivers/usb/core/hub.c | 13 ++ include/linux/pm.h |1 + include/linux/pm_runtime.h |3 ++ include/linux/sched.h| 10 + mm/page_alloc.c | 10 - mm/vmscan.c | 12 ++ net/core/net-sysfs.c |5 +++ Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/6] mm: teach mm by current context info to not do I/O during memory allocation
This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of 'struct task_struct'), so that the flag can be set by one task to avoid doing I/O inside memory allocation in the task's context. The patch trys to solve one deadlock problem caused by block device, and the problem may happen at least in the below situations: - during block device runtime resume, if memory allocation with GFP_KERNEL is called inside runtime resume callback of any one of its ancestors(or the block device itself), the deadlock may be triggered inside the memory allocation since it might not complete until the block device becomes active and the involed page I/O finishes. The situation is pointed out first by Alan Stern. It is not a good approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because several subsystems may be involved(for example, PCI, USB and SCSI may be involved for usb mass stoarage device, network devices involved too in the iSCSI case) - during block device runtime suspend, because runtime resume need to wait for completion of concurrent runtime suspend. - during error handling of usb mass storage deivce, USB bus reset will be put on the device, so there shouldn't have any memory allocation with GFP_KERNEL during USB bus reset, otherwise the deadlock similar with above may be triggered. Unfortunately, any usb device may include one mass storage interface in theory, so it requires all usb interface drivers to handle the situation. In fact, most usb drivers don't know how to handle bus reset on the device and don't provide .pre_set() and .post_reset() callback at all, so USB core has to unbind and bind driver for these devices. So it is still not practical to resort to GFP_NOIO for solving the problem. Also the introduced solution can be used by block subsystem or block drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing actual I/O transfer. It is not a good idea to convert all these GFP_KERNEL in the affected path into GFP_NOIO because these functions doing that may be implemented as library and will be called in many other contexts. In fact, memalloc_noio() can convert some of current static GFP_NOIO allocation into GFP_KERNEL back in other non-affected contexts, at least almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL after applying the approach and make allocation with GFP_IO only happen in runtime resume/bus reset/block I/O transfer contexts generally. [1], several GFP_KERNEL allocation examples in runtime resume path - pci subsystem acpi_os_allocate <-acpi_ut_allocate <-ACPI_ALLOCATE_ZEROED <-acpi_evaluate_object <-__acpi_bus_set_power <-acpi_bus_set_power <-acpi_pci_set_power_state <-platform_pci_set_power_state <-pci_platform_power_transition <-__pci_complete_power_transition <-pci_set_power_state <-pci_restore_standard_config <-pci_pm_runtime_resume - usb subsystem usb_get_status <-finish_port_resume <-usb_port_resume <-generic_resume <-usb_resume_device <-usb_resume_both <-usb_runtime_resume - some individual usb drivers usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, That is just what I have found. Unfortunately, this allocation can only be found by human being now, and there should be many not found since any function in the resume path(call tree) may allocate memory with GFP_KERNEL. Cc: Alan Stern Cc: Oliver Neukum Cc: Jiri Kosina Cc: Andrew Morton Cc: Mel Gorman Cc: KAMEZAWA Hiroyuki Cc: Michal Hocko Cc: Ingo Molnar Cc: Peter Zijlstra Cc: "Rafael J. Wysocki" Signed-off-by: Minchan Kim Signed-off-by: Ming Lei --- v4: - fix comment v3: - no change v2: - remove changes on 'may_writepage' and 'may_swap' because that isn't related with the patchset, and can't introduce I/O in allocation path if GFP_IOFS is unset, so handing 'may_swap' and may_writepage on GFP_NOIO or GFP_NOFS should be a mm internal thing, and let mm guys deal with that, :-). Looks clearing the two may_XXX flag only excludes dirty
[PATCH v4 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
The patch introduces the flag of memalloc_noio in 'struct dev_pm_info' to help PM core to teach mm not allocating memory with GFP_KERNEL flag for avoiding probable deadlock. As explained in the comment, any GFP_KERNEL allocation inside runtime_resume() or runtime_suspend() on any one of device in the path from one block or network device to the root device in the device tree may cause deadlock, the introduced pm_runtime_set_memalloc_noio() sets or clears the flag on device in the path recursively. Cc: Alan Stern Cc: "Rafael J. Wysocki" Signed-off-by: Ming Lei --- v4: - rename memalloc_noio_resume as memalloc_noio - remove pm_runtime_get_memalloc_noio() - add comments on pm_runtime_set_memalloc_noio v3: - introduce pm_runtime_get_memalloc_noio() - hold one global lock on pm_runtime_set_memalloc_noio - hold device power lock when accessing memalloc_noio_resume flag suggested by Alan Stern - implement pm_runtime_set_memalloc_noio without recursion suggested by Alan Stern v2: - introduce pm_runtime_set_memalloc_noio() --- drivers/base/power/runtime.c | 57 ++ include/linux/pm.h |1 + include/linux/pm_runtime.h |3 +++ 3 files changed, 61 insertions(+) diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 3148b10..d477924 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -124,6 +124,63 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev) } EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration); +static int dev_memalloc_noio(struct device *dev, void *data) +{ + return dev->power.memalloc_noio; +} + +/* + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag. + * @dev: Device to handle. + * @enable: True for setting the flag and False for clearing the flag. + * + * Set the flag for all devices in the path from the device to the + * root device in the device tree if @enable is true, otherwise clear + * the flag for devices in the path whose siblings don't set the flag. + * + * The function should only be called by block device, or network + * device driver for solving the deadlock problem during runtime + * resume/suspend: + * if memory allocation with GFP_KERNEL is called inside runtime + * resume/suspend callback of any one of its ancestors(or the + * block device itself), the deadlock may be triggered inside the + * memory allocation since it might not complete until the block + * device becomes active and the involed page I/O finishes. The + * situation is pointed out first by Alan Stern. Network device + * are involved in iSCSI kind of situation. + * + * The lock of dev_hotplug_mutex is held in the function for handling + * hotplug race because pm_runtime_set_memalloc_noio() may be called + * in async probe(). + * + * The function should be called between device_add() and device_del() + * on the affected device(block/network device). + */ +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable) +{ + static DEFINE_MUTEX(dev_hotplug_mutex); + + mutex_lock(&dev_hotplug_mutex); + for(;;) { + /* hold power lock since bitfield is not SMP-safe. */ + spin_lock_irq(&dev->power.lock); + dev->power.memalloc_noio = enable; + spin_unlock_irq(&dev->power.lock); + + dev = dev->parent; + + /* only clear the flag for one device if all +* children of the device don't set the flag. +*/ + if (!dev || (!enable && +device_for_each_child(dev, NULL, + dev_memalloc_noio))) + break; + } + mutex_unlock(&dev_hotplug_mutex); +} +EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio); + /** * rpm_check_suspend_allowed - Test whether a device may be suspended. * @dev: Device to test. diff --git a/include/linux/pm.h b/include/linux/pm.h index 03d7bb1..1a8a69d 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -538,6 +538,7 @@ struct dev_pm_info { unsigned intirq_safe:1; unsigned intuse_autosuspend:1; unsigned inttimer_autosuspends:1; + unsigned intmemalloc_noio:1; enum rpm_requestrequest; enum rpm_status runtime_status; int runtime_error; diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index f271860..775e063 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); extern unsigned long pm_runtime_autosuspend_expiration(struct device *
[PATCH v4 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices
This patch applyes the introduced pm_runtime_set_memalloc_noio on block device so that PM core will teach mm to not allocate memory with GFP_IOFS when calling the runtime_resume and runtime_suspend callback for block devices and its ancestors. Cc: Jens Axboe Signed-off-by: Ming Lei --- v4: - call pm_runtime_set_memalloc_noio(ddev, true) after device_add --- block/genhd.c |9 + 1 file changed, 9 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index 9e02cd6..f3fe3aa 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "blk.h" @@ -532,6 +533,13 @@ static void register_disk(struct gendisk *disk) return; } } + + /* avoid probable deadlock caused by allocating memory with +* GFP_KERNEL in runtime_resume callback of its all ancestor +* deivces +*/ + pm_runtime_set_memalloc_noio(ddev, true); + disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj); disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj); @@ -661,6 +669,7 @@ void del_gendisk(struct gendisk *disk) disk->driverfs_dev = NULL; if (!sysfs_deprecated) sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk))); + pm_runtime_set_memalloc_noio(disk_to_dev(disk), false); device_del(disk_to_dev(disk)); } EXPORT_SYMBOL(del_gendisk); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
Deadlock might be caused by allocating memory with GFP_KERNEL in runtime_resume and runtime_suspend callback of network devices in iSCSI situation, so mark network devices and its ancestor as 'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio(). Cc: "David S. Miller" Cc: Eric Dumazet Cc: David Decotigny Cc: Tom Herbert Cc: Ingo Molnar Signed-off-by: Ming Lei --- v4: - call pm_runtime_set_memalloc_noio(ddev, true) after device_add --- net/core/net-sysfs.c |5 + 1 file changed, 5 insertions(+) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index bcf02f6..a55d255 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include "net-sysfs.h" @@ -1386,6 +1387,8 @@ void netdev_unregister_kobject(struct net_device * net) remove_queue_kobjects(net); + pm_runtime_set_memalloc_noio(dev, false); + device_del(dev); } @@ -1421,6 +1424,8 @@ int netdev_register_kobject(struct net_device *net) return error; } + pm_runtime_set_memalloc_noio(dev, true); + return error; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
This patch applies the introduced memalloc_noio_save() and memalloc_noio_restore() to force memory allocation with no I/O during runtime_resume/runtime_suspend callback on device with the flag of 'memalloc_noio' set. Cc: Alan Stern Cc: Oliver Neukum Cc: Rafael J. Wysocki Signed-off-by: Ming Lei --- v4: - runtime_suspend need this too because rpm_resume may wait for completion of concurrent runtime_suspend, so deadlock still may be triggered in runtime_suspend path. --- drivers/base/power/runtime.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index d477924..7ed17a9 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -368,6 +368,7 @@ static int rpm_suspend(struct device *dev, int rpmflags) int (*callback)(struct device *); struct device *parent = NULL; int retval; + unsigned int noio_flag; trace_rpm_suspend(dev, rpmflags); @@ -477,7 +478,20 @@ static int rpm_suspend(struct device *dev, int rpmflags) if (!callback && dev->driver && dev->driver->pm) callback = dev->driver->pm->runtime_suspend; - retval = rpm_callback(callback, dev); + /* +* Deadlock might be caused if memory allocation with GFP_KERNEL +* happens inside runtime_suspend callback of one block device's +* ancestor or the block device itself. Network device might be +* thought as part of iSCSI block device, so network device and +* its ancestor should be marked as memalloc_noio. +*/ + if (dev->power.memalloc_noio) { + memalloc_noio_save(noio_flag); + retval = rpm_callback(callback, dev); + memalloc_noio_restore(noio_flag); + } else { + retval = rpm_callback(callback, dev); + } if (retval) goto fail; @@ -560,6 +574,7 @@ static int rpm_resume(struct device *dev, int rpmflags) int (*callback)(struct device *); struct device *parent = NULL; int retval = 0; + unsigned int noio_flag; trace_rpm_resume(dev, rpmflags); @@ -709,7 +724,20 @@ static int rpm_resume(struct device *dev, int rpmflags) if (!callback && dev->driver && dev->driver->pm) callback = dev->driver->pm->runtime_resume; - retval = rpm_callback(callback, dev); + /* +* Deadlock might be caused if memory allocation with GFP_KERNEL +* happens inside runtime_resume callback of one block device's +* ancestor or the block device itself. Network device might be +* thought as part of iSCSI block device, so network device and +* its ancestor should be marked as memalloc_noio. +*/ + if (dev->power.memalloc_noio) { + memalloc_noio_save(noio_flag); + retval = rpm_callback(callback, dev); + memalloc_noio_restore(noio_flag); + } else { + retval = rpm_callback(callback, dev); + } if (retval) { __update_runtime_status(dev, RPM_SUSPENDED); pm_runtime_cancel_pending(dev); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 6/6] USB: forbid memory allocation with I/O during bus reset
If one storage interface or usb network interface(iSCSI case) exists in current configuration, memory allocation with GFP_KERNEL during usb_device_reset() might trigger I/O transfer on the storage interface itself and cause deadlock because the 'us->dev_mutex' is held in .pre_reset() and the storage interface can't do I/O transfer when the reset is triggered by other interface, or the error handling can't be completed if the reset is triggered by the storage itself(error handling path). Cc: Alan Stern Cc: Oliver Neukum Signed-off-by: Ming Lei --- v4: - mark current memalloc_noio for every usb device reset --- drivers/usb/core/hub.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 5b131b6..788e652 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -5044,6 +5044,7 @@ int usb_reset_device(struct usb_device *udev) { int ret; int i; + unsigned int noio_flag; struct usb_host_config *config = udev->actconfig; if (udev->state == USB_STATE_NOTATTACHED || @@ -5053,6 +5054,17 @@ int usb_reset_device(struct usb_device *udev) return -EINVAL; } + /* +* Don't allocate memory with GFP_KERNEL in current +* context to avoid possible deadlock if usb mass +* storage interface or usbnet interface(iSCSI case) +* is included in current configuration. The easist +* approach is to do it for every device reset, +* because the device 'memalloc_noio' flag may have +* not been set before reseting the usb device. +*/ + memalloc_noio_save(noio_flag); + /* Prevent autosuspend during the reset */ usb_autoresume_device(udev); @@ -5097,6 +5109,7 @@ int usb_reset_device(struct usb_device *udev) } usb_autosuspend_device(udev); + memalloc_noio_restore(noio_flag); return ret; } EXPORT_SYMBOL_GPL(usb_reset_device); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] firmware loader: introduce module parameter to customize(v4) fw search path
This patch introduces one module parameter of 'path' in firmware_class to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1], and the typical usage is passing the below from kernel command parameter when 'firmware_class' is built in kernel: firmware_class.path=$CUSTOMIZED_PATH [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- v4: - fix one comment and rebase on the latest next tree v3: - fix one mistake on checking unset firmware path v2: - take a cleaner approach suggested by Linus - mark the path array as const because it needn't be changed - fix one error in Document about the module name v1: - remove kernel boot parameter and only support the feature by module parameter as suggested by Greg --- Documentation/firmware_class/README |5 + drivers/base/firmware_class.c | 17 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..e9fce78 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,12 +22,17 @@ - calls request_firmware(&fw_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by module parameter 'path'[1] "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" - If found, goto 7), else goto 2) + [1], the 'path' is a string parameter which length should be less + than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH' + if firmware_class is built in kernel(the general situation) + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 620b876..2067514 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf) } /* direct firmware loading support */ -static const char *fw_path[] = { +static char fw_path_para[256]; +static const char * const fw_path[] = { + fw_path_para, "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" }; +/* + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' + * from kernel command line because firmware_class is generally built in + * kernel instead of module. + */ +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline_for_stack long fw_file_size(struct file *file) { @@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) for (i = 0; i < ARRAY_SIZE(fw_path); i++) { struct file *file; + + /* skip the unset customized path */ + if (!fw_path[i][0]) + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); file = filp_open(path, O_RDONLY, 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] firmware loader: document firmware cache mechanism
From: Ming Lei This patch documents the firmware cache mechanism so that users of request_firmware() know that it can be called safely inside device's suspend and resume callback, and the device's firmware needn't be cached any more by individual driver itself to deal with firmware loss during system resume. Signed-off-by: Ming Lei --- Documentation/firmware_class/README |7 +++ drivers/base/firmware_class.c |3 +++ 2 files changed, 10 insertions(+) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index e9fce78..43fada9 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -119,3 +119,10 @@ on the setup, so I think that the choice on what firmware to make persistent should be left to userspace. + about firmware cache: + + After firmware cache mechanism is introduced during system sleep, + request_firmware can be called safely inside device's suspend and + resume callback, and callers need't cache the firmware by + themselves any more for dealing with firmware loss during system + resume. diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 2067514..15f2822 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -978,6 +978,9 @@ err_put_dev: * firmware image for this or any other device. * * Caller must hold the reference count of @device. + * + * The function can be called safely inside device's suspend and + * resume callback. **/ int request_firmware(const struct firmware **firmware_p, const char *name, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 09/13] driver core: firmware loader: store firmware name into devres list
On Thu, Jul 26, 2012 at 12:15 AM, Borislav Petkov wrote: > On Wed, Jul 25, 2012 at 01:00:09AM +0800, Ming Lei wrote: >> This patch will store firmware name into devres list of the device >> which is requesting firmware loading, so that we can implement >> auto cache firmware for devices in need. > > Stupid question: does this mean that once the firmware name is in the > devres list, it is being cached automatically and device drivers which > don't want that need to explicitly uncache it? Both the auto cache and auto uncache actions are not triggered by device driver, and will be triggered by some system state, for example, in 13/13, you will find the cache is done before system suspend and the uncache is done after system resume. If device drivers want to cache its firmware explicitly, they have to uncache it explicitly too, see cache_firmware/uncache_firmware in 7/13. Thanks -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 11/13] driver core: firmware: introduce devices_cache/uncache_firmwares
On Thu, Jul 26, 2012 at 12:52 AM, Borislav Petkov wrote: > On Wed, Jul 25, 2012 at 01:00:11AM +0800, Ming Lei wrote: >> This patches introduces the three helpers below: >> >> void device_cache_firmwares(void) >> void device_uncache_firmwares(void) >> void device_uncache_firmwares_delay(unsigned long) > > I kinda don't like the plural of firmware: "firmwares". Can we call > those > device_cache_fw_images > device_uncache_fw_images Looks fine. > > or > device_cache_fw_blobs > > or whatever? > >> so we can call device_cache_firmwares() to cache firmwares for >> all devices which need firmwares to work, and the device driver >> can get the firmware easily from kernel memory when system isn't >> readly for completing their requests of loading firmwares. >> >> When system is ready for completing firmware loading, driver core >> can call device_uncache_firmwares() or its delay version to free >> the cached firmwares. >> >> The above helpers should be used to cache device firmwares during >> system suspend/resume cycle in the following patches. >> >> Signed-off-by: Ming Lei >> --- >> drivers/base/firmware_class.c | 236 >> +++-- >> 1 file changed, 230 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c >> index c181e6d..7a96e75 100644 >> --- a/drivers/base/firmware_class.c >> +++ b/drivers/base/firmware_class.c >> @@ -22,6 +22,10 @@ >> #include >> #include >> #include >> +#include >> +#include >> + >> +#include "base.h" >> >> MODULE_AUTHOR("Manuel Estrada Sainz"); >> MODULE_DESCRIPTION("Multi purpose firmware loading support"); >> @@ -91,6 +95,17 @@ struct firmware_cache { >> /* firmware_buf instance will be added into the below list */ >> spinlock_t lock; >> struct list_head head; >> + >> + /* >> + * Name of firmware which has been cached successfully will be >> + * added into the below list so that device uncache helper can >> + * trace which firmware has been cached before. >> + */ > > Comment above the list_head and maybe call the list_head fw_names or so? > >> + spinlock_t name_lock; >> + struct list_head name_head; >> + wait_queue_head_t wait_queue; > > Stray \t > >> + int cnt; >> + struct delayed_work work; >> }; >> >> struct firmware_buf { >> @@ -107,6 +122,11 @@ struct firmware_buf { >> char fw_id[]; >> }; >> >> +struct fw_name_for_cache { >> + struct list_head list; >> + char name[]; >> +}; > > Maybe fw_cache_entry? Looks fine. > >> + >> struct firmware_priv { >> struct timer_list timeout; >> bool nowait; >> @@ -214,12 +234,6 @@ static void fw_free_buf(struct firmware_buf *buf) >> kref_put(&buf->ref, __fw_free_buf); >> } >> >> -static void fw_cache_init(void) >> -{ >> - spin_lock_init(&fw_cache.lock); >> - INIT_LIST_HEAD(&fw_cache.head); >> -} >> - >> static struct firmware_priv *to_firmware_priv(struct device *dev) >> { >> return container_of(dev, struct firmware_priv, dev); >> @@ -981,6 +995,216 @@ int uncache_firmware(const char *fw_name) >> return -EINVAL; >> } >> >> +static struct fw_name_for_cache *alloc_fw_name_cache(const char *name) >> +{ >> + struct fw_name_for_cache *nc; >> + >> + nc = kzalloc(sizeof(nc) + strlen(name) + 1, GFP_KERNEL); >> + if (!nc) >> + goto exit; >> + >> + strcpy(nc->name, name); >> +exit: >> + return nc; >> +} >> + >> +static void free_fw_name_cache(struct fw_name_for_cache *nc) >> +{ >> + kfree(nc); >> +} >> + >> +static void __async_dev_cache_firmware(void *fw_name, >> + async_cookie_t cookie) > > Arg alignment. I am wondering why checkpatch.pl doesn't add the check... > >> +{ >> + struct fw_name_for_cache *nc; >> + struct firmware_cache *fwc = &fw_cache; >> + char *curr_name; >> + int ret; >> + >> + /* 'fw_name' is stored in devres, and it may be released, >> + * so allocate buffer to store the firmware name >> + */ >> + curr_name = kstrdup((const char *)fw_name, GFP_KERNEL)
Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime
On Thu, Jul 26, 2012 at 8:20 PM, Borislav Petkov wrote: > > Ok, here's what I got from looking at the patch: > > Your commit message says: "Also request_firmware_nowait should be called > in atomic context now, so fix the obsolete comments." > > Atomic context in my book means you're not allowed to sleep at all. In fact, I mean the function can be called in atomic context now, and I know some time ago the function will create kthread to execute the request_firmware, and atomic context is not allowed. So I remove the obsolete comment. > > But the comment says that it is possible to sleep a little. This is very > wrongly formulated AFAICT. The function can be run in both contexts, and I don't see any words which says the function will sleep. > > But, since request_firmware_nowait receives a GFP mask as one of its > arguments and some of its callers don't supply GFP_ATOMIC then this > has nothing to do with atomic contexts at all. Then, you should simply > explain in the comment why exactly callers aren't allowed to be sleeping > for a long time. And using adjectives like "long" or "short" is very > misleading in such explanations so please be more specific as to why the It is the original one, and I don't think it is wrong. Also it shouldn't be covered by this patch. Maybe I shouldn't have fixed the comment in this patch. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware
On Thu, Jul 26, 2012 at 8:36 PM, Borislav Petkov wrote: > On Wed, Jul 25, 2012 at 01:00:12AM +0800, Ming Lei wrote: >> Because device_cache_firmwares only cache the firmware which has been >> loaded sucessfully at leat once, using a small loading timeout should > > least > >> be OK. > > Your commit message doesn't explain why exactly we decrease the timeout: I have explained it. Because the firmware has been loaded successfully at least once, so it is very probably to not timeout. > you should probably say that this patch overrides the default 60s > timeout because we're in pre-suspend/-hibernate mode where we have > userspace and are able to load the firmware quickly. No, it is not what I was saying. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 13/13] driver core: firmware loader: cache devices firmware during suspend/resume cycle
On Thu, Jul 26, 2012 at 8:43 PM, Borislav Petkov wrote: >> +#else >> +static int fw_pm_notify(struct notifier_block *notify_block, >> + unsigned long mode, void *unused) >> +{} > > static inline int fw_pm... Will add inline in -v1. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 10/13] driver core: devres: introduce devres_for_each_res
On Thu, Jul 26, 2012 at 12:25 AM, Borislav Petkov wrote: > On Wed, Jul 25, 2012 at 01:00:10AM +0800, Ming Lei wrote: >> This patch introduces one devres API of devres_for_each_res >> so that the device's driver can iterate each resource it has >> interest in. >> >> The firmware loader will use the API to get each firmware name >> from the device instance. >> >> Signed-off-by: Ming Lei >> --- >> drivers/base/devres.c | 42 ++ >> include/linux/device.h |3 +++ >> 2 files changed, 45 insertions(+) >> >> diff --git a/drivers/base/devres.c b/drivers/base/devres.c >> index 2360adb..8273ba5 100644 >> --- a/drivers/base/devres.c >> +++ b/drivers/base/devres.c >> @@ -144,6 +144,48 @@ EXPORT_SYMBOL_GPL(devres_alloc); >> #endif >> >> /** >> + * devres_for_each_res - Resource iterator >> + * @dev: Device to iterate resource from >> + * @release: Look for resources associated with this release function >> + * @match: Match function (optional) >> + * @match_data: Data for the match function >> + * @fn: function to be called for each matched resource. >> + * >> + * Call @fn for each devres of @dev which is associated with @release >> + * and for which @match returns 1. >> + * >> + * RETURNS: >> + * void >> + */ >> +void devres_for_each_res(struct device *dev, dr_release_t release, >> + dr_match_t match, void *match_data, >> + void (*fn)(struct device *, void *)) >> +{ >> + struct devres_node *node; >> + struct devres_node *tmp; >> + unsigned long flags; >> + >> + if (!fn) >> + return; >> + >> + spin_lock_irqsave(&dev->devres_lock, flags); >> + list_for_each_entry_safe_reverse(node, tmp, >> + &dev->devres_head, entry) { > > Why break this line? > > list_for_each_entry_safe_reverse(node, tmp, &dev->devres_head, entry) > { > > is perfectly fine. > >> + struct devres *dr = container_of(node, struct devres, node); >> + >> + if (node->release != release) >> + continue; >> + if (match && !match(dev, dr->data, match_data)) >> + continue; >> + spin_unlock_irqrestore(&dev->devres_lock, flags); >> + fn(dev, dr->data); >> + spin_lock_irqsave(&dev->devres_lock, flags); >> + } >> + spin_unlock_irqrestore(&dev->devres_lock, flags); > > This looks strange. For the last node on the list, we're grabbing the > lock and releasing it right after. > > Probably check whether node is the last element and only re-grab the > lock if it isn't? It is not necessary since the lock isn't held in hot path. > > And btw, don't we need to hold the ->devres_lock during the whole search > like callers of find_dr do, for example? Because I don't want to put more constraint on the 'fn' about lock use, memory allocation flag(gfp)..., from the view of API's user. In fact, there is problem wrt. releasing lock since add_dr may add new node during releasing lock. So I plan to just hold the lock to cover calling 'fn' in -v1 instead of using rcu list helpers, which may introduce a lot change on devres. Anyway the callers can copy the resources interested into a temporary list in 'fn' and handle it later, which may simplify devres_for_each_res and other devres helpers a lot. Also I will introduce another parameter of 'void *data' to 'fn' in -v1. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel/watchdog.c : fix smp_processor_id() warning
On Fri, Jul 27, 2012 at 3:43 AM, Don Zickus wrote: > On Wed, Jul 25, 2012 at 12:39:45PM +0800, Ming Lei wrote: >> Use raw_smp_processor_id in lockup_detector_bootcpu_resume() >> because it is enough when non-boot CPUs are offline. >> >> This patch fixes the following warning when DEBUG_PREEMPT >> is enabled. > > Is this patched on top of linux-next? Yes. The warning is introduced by the commit below: 7fb860ff90ae970cf62cf676dfc1addcf8415674 (NMI watchdog: fix for lockup detector breakage on resume) > It seems right based on the code usage. Though it makes me sad the resume > code has to hack into the cpu notifiers like that. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime
On Fri, Jul 27, 2012 at 1:46 AM, Borislav Petkov wrote: > On Thu, Jul 26, 2012 at 11:44:48PM +0800, Ming Lei wrote: >> On Thu, Jul 26, 2012 at 8:20 PM, Borislav Petkov wrote: >> > >> > Ok, here's what I got from looking at the patch: >> > >> > Your commit message says: "Also request_firmware_nowait should be called >> > in atomic context now, so fix the obsolete comments." >> > >> > Atomic context in my book means you're not allowed to sleep at all. >> >> In fact, I mean the function can be called in atomic context now, and >> I know some time ago the function will create kthread to execute >> the request_firmware, and atomic context is not allowed. > > Right, but when called with GFP_KERNEL mask, it can sleep, right? > >> > But the comment says that it is possible to sleep a little. This is very >> > wrongly formulated AFAICT. >> >> The function can be run in both contexts, and I don't see any words which >> says the function will sleep. > > " > ... > * Asynchronous variant of request_firmware() for user contexts where > * it is not possible to sleep for long time. > **/ > " > > Not possible to sleep for a long time means the function still *can* > sleep... even for short time. For a certain definion of "short." In fact, many drivers like to use it in its probe() because too long sleep in probe will slow down kernel boot if driver is built in kernel. During kernel boot, rootfs is not mounted and user space is not ready, request_firmware will timeout to cause problem. Also after introducing work in this function, it is allowed to be called in atomic context if 'gfp' is passed as GFP_ATOMIC, so I removed the obsolete comments. > >> > But, since request_firmware_nowait receives a GFP mask as one of its >> > arguments and some of its callers don't supply GFP_ATOMIC then this >> > has nothing to do with atomic contexts at all. Then, you should simply >> > explain in the comment why exactly callers aren't allowed to be sleeping >> > for a long time. And using adjectives like "long" or "short" is very >> > misleading in such explanations so please be more specific as to why the >> >> It is the original one, and I don't think it is wrong. Also it >> shouldn't be covered >> by this patch. >> >> Maybe I shouldn't have fixed the comment in this patch. > > Why, simply fix the comment to adhere to what the function does. And > since it can sleep, maybe the easiest fix is to say simply that: > "function can sleep", right? No, the comment above is misleading and not useless, and I think the below is good: * Asynchronous variant of request_firmware() for user contexts where * it is not possible to sleep for long time or can't sleep at all, depends * on the @gfp flag passed. Anyway, the original part of 'It can't be called in atomic contexts.' is wrong and should be removed. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware
On Fri, Jul 27, 2012 at 1:54 AM, Borislav Petkov wrote: >> No, it is not what I was saying. I just mean the point is not mentioned in my commit log, but I admit it should be a appropriate cause. > > Ok, maybe I'm not understanding this then. So explain to me this: why > do you need that timeout value of 10, how did we decide it to be 10 If one firmware image was loaded successfully before, the probability of loading it successfully at this time should be much higher than the 1st time because something crazy(for example, the firmware is deleted) happens with low probability. Choosing 10 secs is just a estimation for loading time because the maximum size of firmware in current distributions is about 2M bytes, since we know it has been loaded successfully before. > (and not 20 or 30 or whatever)? Generally, why do we need to reprogram > the timer to a smaller timeout instead of simply doing the completion > without a timeout? No, it should be crazy without a timeout, and it can be triggered in init call easily. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware
On Fri, Jul 27, 2012 at 6:35 PM, Borislav Petkov wrote: > On Fri, Jul 27, 2012 at 09:54:25AM +0800, Ming Lei wrote: >> On Fri, Jul 27, 2012 at 1:54 AM, Borislav Petkov wrote: >> >> >> No, it is not what I was saying. >> >> I just mean the point is not mentioned in my commit log, but I admit it >> should >> be a appropriate cause. >> >> > >> > Ok, maybe I'm not understanding this then. So explain to me this: why >> > do you need that timeout value of 10, how did we decide it to be 10 >> >> If one firmware image was loaded successfully before, the probability of >> loading it successfully at this time should be much higher than the 1st time >> because something crazy(for example, the firmware is deleted) happens >> with low probability. > > Believe it or not, I'm addressing exactly the possibility of the > firmware disappearing from under us in the AMD microcode driver > currently :) (and some other annoyances, of course). Of course, it is possible since user can delete it anytime, but with very low probability. > >> Choosing 10 secs is just a estimation for loading time because the maximum >> size of firmware in current distributions is about 2M bytes, since we know >> it has been loaded successfully before. > > This is exactly the comment we want over the code to explain to others > why we're choosing 10 secs. Simply add that sentence above the 10s > assignment and we're perfect! :-) OK, will add the comments in -v1. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime
On Fri, Jul 27, 2012 at 6:32 PM, Borislav Petkov wrote: > > I still don't like too much the "not possible to sleep for long time" > expression. > > Maybe change it to "should sleep for as small periods as possible since > it increases boot time of device drivers requesting firmware in their > ->probe() methods." Fairly enough, will add this kind of description in -v1, and I will introduce one extra patch to fix this comment since it is nothing related with the device lifetime fix and its comments. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] firmware loader: introduce module parameter to customize fw search path
This patch introduces one module parameter of 'path' in firmware_class to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1], and the typical usage is passing the below from kernel command parameter when 'firmware_class' is built in kernel: firmware_class.path=$CUSTOMIZED_PATH [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- v2 - take a cleaner approach suggested by Linus - mark the path array as const because it needn't be changed - fix one error in Document about the module name v1: - remove kernel boot parameter and only support the feature by module parameter as suggested by Greg --- Documentation/firmware_class/README |5 + drivers/base/firmware_class.c | 17 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..e9fce78 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,12 +22,17 @@ - calls request_firmware(&fw_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by module parameter 'path'[1] "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" - If found, goto 7), else goto 2) + [1], the 'path' is a string parameter which length should be less + than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH' + if firmware_class is built in kernel(the general situation) + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 8945f4e..f5a2c40 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf) } /* direct firmware loading support */ -static const char *fw_path[] = { +static char fw_path_para[256]; +static const char * const fw_path[] = { + fw_path_para, "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" }; +/* + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' + * from kernel command because firmware_class is generally built in + * kernel instead of module. + */ +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline long fw_file_size(struct file *file) { @@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) for (i = 0; i < ARRAY_SIZE(fw_path); i++) { struct file *file; + + /* skip the unset customized path */ + if (!fw_path[0]) + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); file = filp_open(path, O_RDONLY, 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] firmware loader: introduce module parameter to customize fw search path
On Fri, Oct 26, 2012 at 10:51 PM, Ming Lei wrote: > + > + /* skip the unset customized path */ > + if (!fw_path[0]) There is one mistake above and should be below, sorry for the noise. if (!fw_path[i][0]) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] firmware loader: introduce module parameter to customize fw search path
This patch introduces one module parameter of 'path' in firmware_class to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1], and the typical usage is passing the below from kernel command parameter when 'firmware_class' is built in kernel: firmware_class.path=$CUSTOMIZED_PATH [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- v3 - fix one mistake on checking unset firmware path v2 - take a cleaner approach suggested by Linus - mark the path array as const because it needn't be changed - fix one error in Document about the module name v1: - remove kernel boot parameter and only support the feature by module parameter as suggested by Greg --- Documentation/firmware_class/README |5 + drivers/base/firmware_class.c | 17 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..e9fce78 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,12 +22,17 @@ - calls request_firmware(&fw_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by module parameter 'path'[1] "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" - If found, goto 7), else goto 2) + [1], the 'path' is a string parameter which length should be less + than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH' + if firmware_class is built in kernel(the general situation) + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 8945f4e..62568c2 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf) } /* direct firmware loading support */ -static const char *fw_path[] = { +static char fw_path_para[256]; +static const char * const fw_path[] = { + fw_path_para, "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" }; +/* + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' + * from kernel command because firmware_class is generally built in + * kernel instead of module. + */ +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline long fw_file_size(struct file *file) { @@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) for (i = 0; i < ARRAY_SIZE(fw_path); i++) { struct file *file; + + /* skip the unset customized path */ + if (!fw_path[i][0]) + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); file = filp_open(path, O_RDONLY, 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] firmware loader: introduce module parameter to customize fw search path
On Sat, Oct 27, 2012 at 1:20 PM, anish kumar wrote: >> >> + [1], the 'path' is a string parameter which length should be less > whose length should be less... Maybe, but it is not a big deal, you may find some drivers can keep one module string parameter length as 4096. >> + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' >> + * from kernel command because firmware_class is generally built in > do you mean kernel command line? Yes. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: use noinline_for_stack
On Sun, Oct 28, 2012 at 6:37 AM, Cesar Eduardo Barros wrote: > The comment above fw_file_size() suggests it is noinline for stack size > reasons. Use noinline_for_stack to make this more clear. Acked-by: Ming Lei Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 0/6] solve deadlock caused by memory allocation with I/O
This patchset try to solve one deadlock problem which might be caused by memory allocation with block I/O during runtime resume and block device error handling path. Traditionly, the problem is addressed by passing GFP_NOIO statically to mm, but that is not a effective solution, see detailed description in patch 1's commit log. This patch set introduces one process flag and trys to fix one deadlock problem on block device/network device during runtime resume or usb bus reset. The 1st one is the change on include/sched.h and mm. The 2nd patch introduces the flag of memalloc_noio_resume on 'dev_pm_info', and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not allocate mm with GFP_IOFS during the runtime_resume callback only on device with the flag set. The following 2 patches apply the introduced pm_runtime_set_memalloc_noio() to mark all devices as memalloc_noio_resume in the path from the block or network device to the root device in device tree. The last 2 patches are applied again PM and USB subsystem to demonstrate how to use the introduced mechanism to fix the deadlock problem. V3: - patch 2/6 and 5/6 changed, see their commit log - remove RFC from title since several guys have expressed that it is a reasonable solution V2: - remove changes on 'may_writepage' and 'may_swap'(1/6) - unset GFP_IOFS in try_to_free_pages() path(1/6) - introduce pm_runtime_set_memalloc_noio() - only apply the meachnism on block/network device and its ancestors for runtime resume context V1: - take Minchan's change to avoid the check in alloc_page hot path - change the helpers' style into save/restore as suggested by Alan - memory allocation with no io in usb bus reset path for all devices as suggested by Greg and Oliver block/genhd.c|8 drivers/base/power/runtime.c | 88 +- drivers/usb/core/hub.c | 15 +++ include/linux/pm.h |1 + include/linux/pm_runtime.h |5 +++ include/linux/sched.h| 10 + mm/page_alloc.c | 10 - mm/vmscan.c | 12 ++ net/core/net-sysfs.c |5 +++ 9 files changed, 152 insertions(+), 2 deletions(-) Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/