linux-next: build failure after merge of the fsl tree
Hi Scott, After merging the fsl tree, today's linux-next build (powerpc allyesconfig) failed like this: arch/powerpc/kernel/epapr_paravirt.c: In function 'epapr_idle_init': arch/powerpc/kernel/epapr_paravirt.c:77:23: error: 'epapr_ev_idle' undeclared (first use in this function) ppc_md.power_save = epapr_ev_idle; ^ Caused by commit 7762b1ed7aae ("powerpc: move epapr paravirt init of power_save to an initcall"). I have reverted that commit for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall
On 05/12/2014 01:11 PM, David Miller wrote: So it makes sense to return errors for send() syscall. Signed-off-by: Xufeng Zhang I disagree. If pfkey_error() is successful, the error will be reported in the AF_KEY message that is broadcast, there is no reason for sendmsg to return an error. The message was sucessfully sent, there was no problem with it's passage into the AF_KEY layer. Like netlink, operational responses come in packets, not error codes. However, if pfkey_error() fails, we must do pass back the original error code because it's a last ditch effort to prevent information from being lost. That's why 'err' must be preserved when pfkey_error() returns zero. I know what you mean, but isn't the kernel API aimed to facilitate the implementation of user space? Since send the message to the kernel and receive the error report message are asynchronous, I don't think it's easy to recover from the EINTR error by parsing the error report message. Thanks, Xufeng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hrtimer: reprogram event for expires=KTIME_MAX in hrtimer_force_reprogram()
On 10 May 2014 21:47, Preeti U Murthy wrote: > On 05/09/2014 04:27 PM, Viresh Kumar wrote: >> On 9 May 2014 16:04, Preeti U Murthy wrote: >> Ideally, the device should have stopped events as we programmed it in >> ONESHOT mode. And should have waited for kernel to set it again.. >> >> But probably that device doesn't have a ONESHOT mode and is firing >> again and again. Anyway the real problem I was trying to solve wasn't >> infinite interrupts coming from event dev, but the first extra event that >> we should have got rid of .. It just happened that we got more problems >> on this particular board. > > So on a timer interrupt the tick device, irrespective of if it is in > ONESHOT mode or not, is in an expired state. Thus it will continue to > fire. What has ONESHOT mode got to do with this? So, the arch specific timer handler must be clearing it I suppose and it shouldn't have fired again after 5 ms as it is not reprogrammed. Probably that's an implementation specific stuff.. I have seen timers which have two modes, periodic: they fire continuously and oneshot: they get disabled after firing and have to be reprogrammed. >>> The reason this got exposed in NOHZ_FULL config is because in a normal >>> NOHZ scenario when the cpu goes idle, and there are no pending timers in >>> timer_list, even then tick_sched_timer gets cancelled. Precisely the >>> scenario that you have described. >> >> I haven't tried but it looks like this problem will exist there as well.. >> Who is >> disabling the event device in that case when tick_sched timer goes off ? >> The same question that is applicable in this case as well.. >> >>>But we don't get continuous interrupts then because the first time we >>> get an interrupt, we queue the tick_sched_timer and program the tick >>> device to the time of its expiry and therefore *push* the time at which >>> your tick device should fire further. >> >> Probably not.. We don't get continuous interrupts because that's a special >> case for my platform. But I am quite sure you would be getting one extra >> interrupt after tick period, but because we didn't had anything to service > > Hmm? I didn't get this. Why would we? We ensure that if there are no > pending timers in timer_list the tick_sched_timer is cancelled. We > cannot get spurious interrupts when there are no pending timers in NOHZ > mode. Okay, there are no pending timers to fire and even we have disabled tick_sched_timer as well.. But the event dev isn't SHUTDOWN or reprogrammed. And so it must fire after tick interval? Exactly the same issue we are getting here in NO_HZ_FULL.. And the worst part is we aren't getting these interrupts in traces as well. Somebody probably need to revisit the trace_irq_handler_entry part as well to catch such problems. > Hmm yeah looking at the problem that you are trying to solve, that being > completely disabling timer interrupts on cpus that are running just one > process, it appears to me that setting the tick device in SHUTDOWN mode > is the only way to do so. And you are right. We use SHUTDOWN mode to > imply that the device can be switched off. Its upto the arch to react to > it appropriately. So, from the mail where tglx blasted me off, we have a better solution to implement now :) > My concern is on powerpc today when we set the device to SHUTDOWN mode > we set the decrementer to a MAX value. Which means we will get > interrupts only spaced out more widely in time. But on NOHZ_FULL mode if > you are looking at completely disabling tick_sched_timer as long as a > single process runs then we might need to change the semantics here. Lets see if we can do some nice stuff with ONESHOT_STOPPED state.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)
On Sun, 2014-05-11 at 21:52 -0700, Guenter Roeck wrote: > Oh well, it was worth a try. Can you give me an example for a failing > configuration ? My g5 config which is close to g5_defconfig with PR KVM enabled. In any case, see my other messages. I'm waiting for all my test builders to come back and if it's clear I'll post a new patch. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bug: acpi ata_bay dock reminds undocked
On Monday 12 May 2014 02:00:29 Rafael J. Wysocki wrote: > On Sunday, May 11, 2014 05:49:17 PM Pali Rohár wrote: > > --nextPart2204083.pLeiedbUui > > Content-Type: Text/Plain; > > > > charset="utf-8" > > > > Content-Transfer-Encoding: quoted-printable > > > > On Wednesday 30 April 2014 11:24:50 Pali Roh=C3=A1r wrote: > > > On Tuesday 29 April 2014 23:35:42 Rafael J. Wysocki wrote: > > > > On Tuesday, April 29, 2014 11:00:01 PM Pali Roh=C3=A1r wrote: > > > > > On Tuesday 29 April 2014 22:55:07 Rafael J. Wysocki wrote: > > > > > > Which kernel version(s) have you tried? > > > > > > > > > >=20 > > > > > > > > > > 3.15-rc3 > > > > > > > >=20 > > > > > > > > Does it work with 3.14(.x) by chance? > > > > > >=20 > > > > > > Tested with 3.14 and 3.8. Same problem, not working. > > > > BUMP! > > > > Rafael, do you need some other information? > > I'll take care of this when I have the time, OK? Ok, I will wait. -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver
On Mon, May 12, 2014 at 09:03:26AM +0400, Alexander Shiyan wrote: > Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer : > > On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote: > > > This patch adds pincontrol driver for Freescale i.MX1 SOCs. > > > > > > Signed-off-by: Alexander Shiyan > > > --- > > > drivers/pinctrl/Kconfig| 7 ++ > > > drivers/pinctrl/Makefile | 1 + > > > drivers/pinctrl/pinctrl-imx1.c | 279 > > > + > > > 3 files changed, 287 insertions(+) > > > create mode 100644 drivers/pinctrl/pinctrl-imx1.c > > > > Nice. I thought about adding devicetree support for i.MX1 aswell. > > > > Don't we need a imx1-pinfunc.h file to make use of this patch? > > It will be added along with the DTS template for that CPU architecture. Ok. Sascha -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)
On 05/11/2014 10:37 PM, Benjamin Herrenschmidt wrote: On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote: On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote: Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the allyesconfig build by moving machine_check_common to a different location. While this fixes most of the errors, both allmodconfig and allyesconfig still fail as follows. arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards Fix by moving machine_check_common after the offending address. This suffers from the same problem as previous attempts, on some of my test configs I get: arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: R_PPC64_REL14 against `.text'+1c90 make[1]: *** [vmlinux] Error 1 make: *** [sub-make] Error 2 IE, it breaks currently working configs. So we need to move more things around and I haven't had a chance to sort it out. Ok, I think I sorted it out for now. It's a mess and likely to break again until we do something more drastic like moving everything that's after 0x8000 to a separate file but for now that will do. Patch on its way, I'll also shoot it to Linus today along with a few other things. Great, thanks a lot! Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)
On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote: > On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote: > > Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes > > the > > allyesconfig build by moving machine_check_common to a different location. > > While this fixes most of the errors, both allmodconfig and allyesconfig > > still > > fail as follows. > > > > arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org > > backwards > > > > Fix by moving machine_check_common after the offending address. > > This suffers from the same problem as previous attempts, on some of my > test configs I get: > > arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to > fit: R_PPC64_REL14 against `.text'+1c90 > make[1]: *** [vmlinux] Error 1 > make: *** [sub-make] Error 2 > > IE, it breaks currently working configs. > > So we need to move more things around and I haven't had a chance to > sort it out. Ok, I think I sorted it out for now. It's a mess and likely to break again until we do something more drastic like moving everything that's after 0x8000 to a separate file but for now that will do. Patch on its way, I'll also shoot it to Linus today along with a few other things. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] tick: SHUTDOWN event-dev if no events are required for KTIME_MAX
Thanks for blasting me off, it might be very helpful going forward :) On 10 May 2014 01:39, Thomas Gleixner wrote: > On Fri, 9 May 2014, Viresh Kumar wrote: >> diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c >> int tick_program_event(ktime_t expires, int force) >> { >> struct clock_event_device *dev = >> __this_cpu_read(tick_cpu_device.evtdev); >> + int ret = 0; >> >> - return clockevents_program_event(dev, expires, force); >> + /* Shut down event device if it is not required for long */ >> + if (unlikely(expires.tv64 == KTIME_MAX)) { >> + dev->last_mode = dev->mode; >> + clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); > > No, we are not doing a state change behind the scene and a magic > restore. And I know at least one way to make this fall flat on its > nose, because you are blindly doing dev->last_mode = dev->mode on > every invocation. So if that gets called twice without a restore in > between, the device is going to be in shutdown mode forever. During my tests I had this as well: if (unlikely(expires.tv64 == KTIME_MAX)) { + WARN_ON(dev->mode == CLOCK_EVT_MODE_SHUTDOWN); But it never got to it and I thought it might never happen, so removed it. But yes, there should be some check here for that. > It's moronic anyway as the clock event device has the state > CLOCK_EVT_MODE_ONESHOT if its active, otherwise we would not be in > that code path. Yeah, Missed that earlier. > But what's even worse: you just define that it's the best way for all > implementations of clockevents to handle this. > > It's definitley NOT. Some startup/shutdown implementations are rather > complex, so that would burden them with rather big latencies and some > of them will even outright break. > > There is a world outside of YOUR favourite subarch. :) > We do not hijack stuff just because we can and it works on some > machines. We think about it proper. Agreed.. > If we hijack some existing facility then we audit ALL implementation > sites and document that we did so and why we are sure that it won't > break stuff. It still might break some oddball case, but that's not a > big issue. Because SHUTDOWN was an existing old API, I thought it will work without breaking stuff. Yes, I must have done some auditing or made this an RFC series atleast to get the discussion going forward.. > In the clockevents case we do not even need a new interface, but this > must be made OPT-in and not a flagday change for all users. > > And no we are not going to abuse a feature flag for this. It's not a > feature. Okay. > I'd rather have a new state for this, simply because it is NOT > shutdown. It is in ONESHOT_STOPPED state. Whether a specific > implementation will use the SHUTDOWN code for it or not does not > matter. Correct. > That requires a full tree update of all implementations because most > of them have a switch case for the mode. And adding a state will cause > all of them which do not have a default clause to omit warnings > because the mode is an enum for this very reason. > > And even if all of them would have a default clause, you'd need a way > to OPT-In, because some of the defaults have a BUG() in there. Again, > no feature flag exclusion. See above. Okay.. > So the right thing to do this is: > > 1A) Change the prototype of the set_mode callback to return int and > fixup all users. Either add the missing default clause or remove > the existing BUG()/ pr_err()/whatever handling in the existing > default clause and return a UNIQUE error code. > > I know I should have done that from the very beginning, but in > hindsight one could have done everything better. > > coccinelle is your friend (if you need help ask me or Julia > Lawall). But it's going to be quite some manual work on top. Sure. > 1B) Audit the changes and look at the implementations. If the patch is > just adding the default clause or replacing some BUG/printk error > handling goto #1C > > If it looks like it needs some preparatory care or if you find > bugs in a particular implementation, roll back the changes and do > the bug fixes and preparatory changes first as separate patches. > > Go back to #1A until the coccinelle patches are just squeaky > clean. > > 1C) Add proper error handling for the various modes to the set_mode > callback call sites, only two AFAIK. > > 2A) Add a new mode ONESHOT_STOPPED. That's safe now as all error > handling will be done in the core code. > > 2B) Implement the ONESHOT_STOPPED logic and make sure all of the core > code is aware of it. Okay.. > And don't tell me it can't be done. No way :) > I've done it I don't know how many > times with interrupts, timers, locking and some more. It's hard work, > but it's valuable and way better than the brainless "make it work for > me" hackery. I didn't mean that actually. I just pin pointed how badly things can
Re: [PATCH V4 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag
On Mon, 12 May 2014 11:24:26 +0800 Ming Lei wrote: > On Sun, May 11, 2014 at 1:06 AM, Fabian Frederick wrote: > > generic_file_fsync has been updated to issue a flush for > > older filesystems. > > > > This patch tests for barrier flag in ext4 mount flags > > and calls the right function. > > > > Suggested-by: Jan Kara > > Suggested-by: Christoph Hellwig > > Cc: Jan Kara > > Cc: Christoph Hellwig > > Cc: Alexander Viro > > Cc: "Theodore Ts'o" > > Cc: Andrew Morton > > Signed-off-by: Fabian Frederick > > --- > > fs/ext4/fsync.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > > index a8bc47f..fa82c0a 100644 > > --- a/fs/ext4/fsync.c > > +++ b/fs/ext4/fsync.c > > @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, > > loff_t end, int datasync) > > > > if (!journal) { > > ret = generic_file_fsync(file, start, end, datasync); > > Forget to remove above line? Oops, of course ! Thanks a lot, I've sent a new version :) Regards, Fabian > > > + if (test_opt(inode->i_sb, BARRIER)) > > + ret = generic_file_fsync(file, start, end, > > datasync); > > + else > > + ret = __generic_file_fsync(file, start, end, > > datasync); > > if (!ret && !hlist_empty(>i_dentry)) > > ret = ext4_sync_parent(inode); > > goto out; > > > > Thanks, > -- > Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V5 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag
generic_file_fsync has been updated to issue a flush for older filesystems. This patch tests for barrier flag in ext4 mount flags and calls the right function. Suggested-by: Jan Kara Suggested-by: Christoph Hellwig Cc: Jan Kara Cc: Christoph Hellwig Cc: Alexander Viro Cc: "Theodore Ts'o" Cc: Andrew Morton Signed-off-by: Fabian Frederick --- fs/ext4/fsync.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index a8bc47f..5b6e9f2 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -107,7 +107,10 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) } if (!journal) { - ret = generic_file_fsync(file, start, end, datasync); + if (test_opt(inode->i_sb, BARRIER)) + ret = generic_file_fsync(file, start, end, datasync); + else + ret = __generic_file_fsync(file, start, end, datasync); if (!ret && !hlist_empty(>i_dentry)) ret = ext4_sync_parent(inode); goto out; -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V5 1/2] FS: Add generic data flush to fsync
This patch issues a flush in generic_file_fsync. (Modern filesystems already do it) Behaviour can be reversed using /sys/devices/.../cache_type or by calling __generic_file_fsync Suggested-by: Jan Kara Suggested-by: Christoph Hellwig Cc: Jan Kara Cc: Christoph Hellwig Cc: Alexander Viro Cc: "Theodore Ts'o" Cc: Andrew Morton Signed-off-by: Fabian Frederick --- v5: patch2/2 ext4 patch fix (Thanks to Ming Lei) V4: update description V3: __generic_file_fsync = no flush V2: No additional flag V1: First version with MS_BARRIER flag fs/libfs.c | 36 +--- include/linux/fs.h | 1 + 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a184424..4877906 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -3,6 +3,7 @@ * Library for filesystems writers. */ +#include #include #include #include @@ -923,16 +924,19 @@ struct dentry *generic_fh_to_parent(struct super_block *sb, struct fid *fid, EXPORT_SYMBOL_GPL(generic_fh_to_parent); /** - * generic_file_fsync - generic fsync implementation for simple filesystems + * __generic_file_fsync - generic fsync implementation for simple filesystems + * * @file: file to synchronize + * @start: start offset in bytes + * @end: end offset in bytes (inclusive) * @datasync: only synchronize essential metadata if true * * This is a generic implementation of the fsync method for simple * filesystems which track all non-inode metadata in the buffers list * hanging off the address_space structure. */ -int generic_file_fsync(struct file *file, loff_t start, loff_t end, - int datasync) +int __generic_file_fsync(struct file *file, loff_t start, loff_t end, +int datasync) { struct inode *inode = file->f_mapping->host; int err; @@ -952,10 +956,36 @@ int generic_file_fsync(struct file *file, loff_t start, loff_t end, err = sync_inode_metadata(inode, 1); if (ret == 0) ret = err; + out: mutex_unlock(>i_mutex); return ret; } +EXPORT_SYMBOL(__generic_file_fsync); + +/** + * generic_file_fsync - generic fsync implementation for simple filesystems + * with flush + * @file: file to synchronize + * @start: start offset in bytes + * @end: end offset in bytes (inclusive) + * @datasync: only synchronize essential metadata if true + * + */ + +int generic_file_fsync(struct file *file, loff_t start, loff_t end, + int datasync) +{ + struct inode *inode = file->f_mapping->host; + int err; + + err = __generic_file_fsync(file, start, end, datasync); + if (err) + return err; + + return blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); + +} EXPORT_SYMBOL(generic_file_fsync); /** diff --git a/include/linux/fs.h b/include/linux/fs.h index 8780312..c3f46e4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2590,6 +2590,7 @@ extern ssize_t simple_read_from_buffer(void __user *to, size_t count, extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count); +extern int __generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_check_addressable(unsigned, u64); -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the gpio tree with the net-next tree
Hi Linus, Today's linux-next merge of the gpio tree got a conflict in Documentation/driver-model/devres.txt between commit 6d48f44b7b2a ("mdio_bus: implement devm_mdiobus_alloc/devm_mdiobus_free") from the net-next tree and commit f9748ef13b6a ("gpio: Add missing device-managed documentation") from the gpio tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc Documentation/driver-model/devres.txt index d483f2cf221b,8ff1167cfedf.. --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@@ -310,7 -309,7 +310,12 @@@ SLAVE DMA ENGIN SPI devm_spi_register_master() +MDIO + devm_mdiobus_alloc() + devm_mdiobus_alloc_size() + devm_mdiobus_free() ++ + GPIO + devm_gpiod_get() + devm_gpiod_get_index() + devm_gpiod_put() signature.asc Description: PGP signature
Re: [PATCHv2 0/2] remap_file_pages() decommission
On Mon, May 12, 2014 at 7:36 AM, Andi Kleen wrote: > Armin Rigo writes: > >> Here is a note from the PyPy project (mentioned earlier in this >> thread, and at https://lwn.net/Articles/587923/ ). > > Your use is completely bogus. remap_file_pages() pins everything > and disables any swapping for the area. Wait, what's wrong with swapping pages from non-linear vmas? try_to_umap() can handle them, though not very effectively. Some time ago I was thinking about tracking rmap for non-linear vmas, something like second-level tree of sub-vmas stored in non-linear vma. This could be done using exising vm_area_struct, and in rmap tree everything will looks just as normal. We'll waste some kernel memory, but it also will remove complexity from rmap and make non-linear vmas usable for all filesystems not just for shmem. But it's not worth. I ACK killing it. Maybe we should keep flag on vma and hide/merge them in proc/maps. Bloating files/dirs in proc might be bigger problem than non-existent performance regression. > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] zram: remove global tb_lock by using lock-free CAS
On Sat, May 10, 2014 at 02:10:08PM +0800, Weijie Yang wrote: > On Thu, May 8, 2014 at 2:24 PM, Minchan Kim wrote: > > On Wed, May 07, 2014 at 11:52:59PM +0900, Joonsoo Kim wrote: > >> >> Most popular use of zram is the in-memory swap for small embedded system > >> >> so I don't want to increase memory footprint without good reason > >> >> although > >> >> it makes synthetic benchmark. Alhought it's 1M for 1G, it isn't small > >> >> if we > >> >> consider compression ratio and real free memory after boot > >> > >> We can use bit spin lock and this would not increase memory footprint for > >> 32 bit > >> platform. > > > > Sounds like a idea. > > Weijie, Do you mind testing with bit spin lock? > > Yes, I re-test them. > This time, I test each case 10 times, and take the average(KS/s). > (the test machine and method are same like previous mail's) > > Iozone test result: > > Test BASE CAS spinlock rwlock bit_spinlock > -- > Initial write 1381094 1425435 1422860 1423075 1421521 >Rewrite 1529479 1641199 1668762 1672855 1654910 > Read 8468009 11324979 11305569 7273 10997202 >Re-read 8467476 11260914 11248059 11145336 10906486 > Reverse Read 6821393 8106334 8282174 8279195 8109186 >Stride read 7191093 8994306 9153982 8961224 9004434 >Random read 7156353 8957932 9167098 8980465 8940476 > Mixed workload 4172747 5680814 5927825 5489578 5972253 > Random write 1483044 1605588 1594329 1600453 1596010 > Pwrite 1276644 1303108 1311612 1314228 1300960 > Pread 4324337 4632869 4618386 4457870 4500166 > > Fio test result: > > Test base CASspinlockrwlock bit_spinlock > - > seq-write 933789 999357 1003298995961 1001958 > seq-read 5634130 6577930 6380861 6243912 6230006 >seq-rw 1405687 1638117 1640256 1633903 1634459 > rand-rw 1386119 1614664 1617211 1609267 1612471 > > > The base is v3.15.0-rc3, the others are per-meta entry lock. > Every optimization method shows higher performance than the base, however, > it is hard to say which method is the most appropriate. It's not too big between CAS and bit_spinlock so I prefer general method. > > To bit_spinlock, the modified code is mainly like this: > > +#define ZRAM_FLAG_SHIFT 16 > + > enum zram_pageflags { > /* Page consists entirely of zeros */ > - ZRAM_ZERO, > + ZRAM_ZERO = ZRAM_FLAG_SHIFT + 1, > + ZRAM_ACCESS, > > __NR_ZRAM_PAGEFLAGS, > }; > > /* Allocated for each disk page */ > struct table { > unsigned long handle; > - u16 size; /* object size (excluding header) */ > - u8 flags; > + unsigned long value; Why does we need to change flags and size "unsigned long value"? Couldn't we use existing flags with just adding new ZRAM_TABLE_LOCK? > } __aligned(4); > > The lower ZRAM_FLAG_SHIFT bits of table.value is size, the higher bits > is for zram_pageflags. By this means, it doesn't increase any memory > overhead on both 32-bit and 64-bit system. > > Any complaint or suggestions are welcomed. Anyway, I'd like to go this way. Pz, resend formal patch with a number. Thanks! > > >> > >> Thanks. > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majord...@kvack.org. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > > > -- > > Kind regards, > > Minchan Kim > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall
From: Xufeng Zhang Date: Fri, 9 May 2014 13:47:35 +0800 > Current implementation for pfkey_sendmsg() always return success > no matter whether or not error happens during this syscall, > this is incompatible with the general send()/sendmsg() API: > man send > RETURN VALUE > On success, these calls return the number of characters sent. > On error, -1 is returned, and errno is set appropriately. > > One side effect this problem introduces is that we can't determine > when to resend the message when the previous send() fails because > it was interrupted by signals. > We detect such a problem when racoon is sending SADBADD message to > add SAD entry in the kernel, but sometimes kernel is responding with > "Interrupted system call"(-EINTR) error. > > Check the send implementation of strongswan, it has below logic: > pfkey_send_socket() > { > ... > while (TRUE) > { > len = send(socket, in, in_len, 0); > > if (len != in_len) > { > case EINTR: > /* interrupted, try again */ > continue; > ... > } > } > ... > } > So it makes sense to return errors for send() syscall. > > Signed-off-by: Xufeng Zhang I disagree. If pfkey_error() is successful, the error will be reported in the AF_KEY message that is broadcast, there is no reason for sendmsg to return an error. The message was sucessfully sent, there was no problem with it's passage into the AF_KEY layer. Like netlink, operational responses come in packets, not error codes. However, if pfkey_error() fails, we must do pass back the original error code because it's a last ditch effort to prevent information from being lost. That's why 'err' must be preserved when pfkey_error() returns zero. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next,v3] Add support for netvsc build without CONFIG_SYSFS flag
From: Haiyang Zhang Date: Thu, 8 May 2014 15:14:10 -0700 > This change ensures the driver can be built successfully without the > CONFIG_SYSFS flag. > MS-TFS: 182270 > > Signed-off-by: Haiyang Zhang > Reviewed-by: K. Y. Srinivasan Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver
Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer : > On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote: > > This patch adds pincontrol driver for Freescale i.MX1 SOCs. > > > > Signed-off-by: Alexander Shiyan > > --- > > drivers/pinctrl/Kconfig| 7 ++ > > drivers/pinctrl/Makefile | 1 + > > drivers/pinctrl/pinctrl-imx1.c | 279 > > + > > 3 files changed, 287 insertions(+) > > create mode 100644 drivers/pinctrl/pinctrl-imx1.c > > Nice. I thought about adding devicetree support for i.MX1 aswell. > > Don't we need a imx1-pinfunc.h file to make use of this patch? It will be added along with the DTS template for that CPU architecture. --- N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH 3.14 27/83] ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe
On Monday 12 May 2014 12:51 AM, Greg Kroah-Hartman wrote: > 3.14-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Vineet Gupta > > commit 8aa9e85adac609588eeec356e5a85059b3b819ba upstream. Hi Greg, This one was also marked for stable 3.10 however because the 2 pre-req patches were not in yet, applying it would have failed and AFAIKR I did describe the state of things in that failure report. Anyhow can you please queue this one up for the next 3.10 stable. Thx, -Vineet > > There was a very small race window where resume to kernel mode from a > Exception Path (or pure kernel mode which is true for most of ARC > exceptions anyways), was not disabling interrupts in restore_regs, > clobbering the exception regs > > Anton found the culprit call flow (after many sleepless nights) > > | 1. we got a Trap from user land > | 2. started to service it. > | 3. While doing some stuff on user-land memory (I think it is padzero()), > | we got a DataTlbMiss > | 4. On return from it we are taking "resume_kernel_mode" path > | 5. NEED_RESHED is not set, so we go to "return from exception" path in > | restore regs. > | 6. there seems to be IRQ happening > > Signed-off-by: Vineet Gupta > Cc: Anton Kolesov > Cc: Francois Bedard > Signed-off-by: Linus Torvalds > Signed-off-by: Greg Kroah-Hartman > > --- > arch/arc/kernel/entry.S |8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > --- a/arch/arc/kernel/entry.S > +++ b/arch/arc/kernel/entry.S > @@ -614,11 +614,13 @@ resume_user_mode_begin: > > resume_kernel_mode: > > -#ifdef CONFIG_PREEMPT > - > - ; This is a must for preempt_schedule_irq() > + ; Disable Interrupts from this point on > + ; CONFIG_PREEMPT: This is a must for preempt_schedule_irq() > + ; !CONFIG_PREEMPT: To ensure restore_regs is intr safe > IRQ_DISABLE r9 > > +#ifdef CONFIG_PREEMPT > + > ; Can't preempt if preemption disabled > GET_CURR_THR_INFO_FROM_SP r10 > ld r8, [r10, THREAD_INFO_PREEMPT_COUNT] > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 0/3] TI CPSW Cleanup
This series does some minimal cleanups. -Conversion of pr_*() to dev_*() -Convert kzalloc to devm_kzalloc. No functional changes. v1 -> v2 Address review comments. v2 -> v3 Remove a stale commit comment. George Cherian (3): driver net: cpsw: Convert pr_*() to dev_*() calls net: davinci_mdio: Convert pr_err() to dev_err() call drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc(). drivers/net/ethernet/ti/cpsw.c | 50 - drivers/net/ethernet/ti/davinci_cpdma.c | 35 --- drivers/net/ethernet/ti/davinci_mdio.c | 2 +- 3 files changed, 38 insertions(+), 49 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 1/3] driver net: cpsw: Convert pr_*() to dev_*() calls
Convert all pr_*() calls to dev_*() calls. No functional changes. Signed-off-by: George Cherian Reviewed-by: Felipe Balbi --- drivers/net/ethernet/ti/cpsw.c | 50 +- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index d14c8da..9512738 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1808,25 +1808,25 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, return -EINVAL; if (of_property_read_u32(node, "slaves", )) { - pr_err("Missing slaves property in the DT.\n"); + dev_err(>dev, "Missing slaves property in the DT.\n"); return -EINVAL; } data->slaves = prop; if (of_property_read_u32(node, "active_slave", )) { - pr_err("Missing active_slave property in the DT.\n"); + dev_err(>dev, "Missing active_slave property in the DT.\n"); return -EINVAL; } data->active_slave = prop; if (of_property_read_u32(node, "cpts_clock_mult", )) { - pr_err("Missing cpts_clock_mult property in the DT.\n"); + dev_err(>dev, "Missing cpts_clock_mult property in the DT.\n"); return -EINVAL; } data->cpts_clock_mult = prop; if (of_property_read_u32(node, "cpts_clock_shift", )) { - pr_err("Missing cpts_clock_shift property in the DT.\n"); + dev_err(>dev, "Missing cpts_clock_shift property in the DT.\n"); return -EINVAL; } data->cpts_clock_shift = prop; @@ -1838,31 +1838,31 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, return -ENOMEM; if (of_property_read_u32(node, "cpdma_channels", )) { - pr_err("Missing cpdma_channels property in the DT.\n"); + dev_err(>dev, "Missing cpdma_channels property in the DT.\n"); return -EINVAL; } data->channels = prop; if (of_property_read_u32(node, "ale_entries", )) { - pr_err("Missing ale_entries property in the DT.\n"); + dev_err(>dev, "Missing ale_entries property in the DT.\n"); return -EINVAL; } data->ale_entries = prop; if (of_property_read_u32(node, "bd_ram_size", )) { - pr_err("Missing bd_ram_size property in the DT.\n"); + dev_err(>dev, "Missing bd_ram_size property in the DT.\n"); return -EINVAL; } data->bd_ram_size = prop; if (of_property_read_u32(node, "rx_descs", )) { - pr_err("Missing rx_descs property in the DT.\n"); + dev_err(>dev, "Missing rx_descs property in the DT.\n"); return -EINVAL; } data->rx_descs = prop; if (of_property_read_u32(node, "mac_control", )) { - pr_err("Missing mac_control property in the DT.\n"); + dev_err(>dev, "Missing mac_control property in the DT.\n"); return -EINVAL; } data->mac_control = prop; @@ -1876,7 +1876,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, ret = of_platform_populate(node, NULL, NULL, >dev); /* We do not want to force this, as in some cases may not have child */ if (ret) - pr_warn("Doesn't have any child node\n"); + dev_warn(>dev, "Doesn't have any child node\n"); for_each_child_of_node(node, slave_node) { struct cpsw_slave_data *slave_data = data->slave_data + i; @@ -1893,7 +1893,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, parp = of_get_property(slave_node, "phy_id", ); if ((parp == NULL) || (lenp != (sizeof(void *) * 2))) { - pr_err("Missing slave[%d] phy_id property\n", i); + dev_err(>dev, "Missing slave[%d] phy_id property\n", i); return -EINVAL; } mdio_node = of_find_node_by_phandle(be32_to_cpup(parp)); @@ -1918,18 +1918,18 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, slave_data->phy_if = of_get_phy_mode(slave_node); if (slave_data->phy_if < 0) { - pr_err("Missing or malformed slave[%d] phy-mode property\n", - i); + dev_err(>dev, "Missing or malformed slave[%d] phy-mode property\n", + i); return slave_data->phy_if; } if (data->dual_emac) { if (of_property_read_u32(slave_node, "dual_emac_res_vlan", )) { - pr_err("Missing dual_emac_res_vlan in DT.\n");
[PATCH v3 2/3] net: davinci_mdio: Convert pr_err() to dev_err() call
Convert the lone pr_err() to dev_err() call. Signed-off-by: George Cherian Reviewed-by: Felipe Balbi --- drivers/net/ethernet/ti/davinci_mdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c index 34e97ec..735dc53 100644 --- a/drivers/net/ethernet/ti/davinci_mdio.c +++ b/drivers/net/ethernet/ti/davinci_mdio.c @@ -303,7 +303,7 @@ static int davinci_mdio_probe_dt(struct mdio_platform_data *data, return -EINVAL; if (of_property_read_u32(node, "bus_freq", )) { - pr_err("Missing bus_freq property in the DT.\n"); + dev_err(>dev, "Missing bus_freq property in the DT.\n"); return -EINVAL; } data->bus_freq = prop; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/3] drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().
Convert kzalloc() to devm_kzalloc(). Signed-off-by: George Cherian Reviewed-by: Felipe Balbi --- drivers/net/ethernet/ti/davinci_cpdma.c | 35 +++-- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c index 88ef270..539dbde 100644 --- a/drivers/net/ethernet/ti/davinci_cpdma.c +++ b/drivers/net/ethernet/ti/davinci_cpdma.c @@ -158,9 +158,9 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 hw_addr, int bitmap_size; struct cpdma_desc_pool *pool; - pool = kzalloc(sizeof(*pool), GFP_KERNEL); + pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL); if (!pool) - return NULL; + goto fail; spin_lock_init(>lock); @@ -170,7 +170,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 hw_addr, pool->num_desc = size / pool->desc_size; bitmap_size = (pool->num_desc / BITS_PER_LONG) * sizeof(long); - pool->bitmap = kzalloc(bitmap_size, GFP_KERNEL); + pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL); if (!pool->bitmap) goto fail; @@ -187,10 +187,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 hw_addr, if (pool->iomap) return pool; - fail: - kfree(pool->bitmap); - kfree(pool); return NULL; } @@ -203,7 +200,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool) spin_lock_irqsave(>lock, flags); WARN_ON(pool->used_desc); - kfree(pool->bitmap); if (pool->cpumap) { dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap, pool->phys); @@ -211,7 +207,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool *pool) iounmap(pool->iomap); } spin_unlock_irqrestore(>lock, flags); - kfree(pool); } static inline dma_addr_t desc_phys(struct cpdma_desc_pool *pool, @@ -276,7 +271,7 @@ struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params *params) { struct cpdma_ctlr *ctlr; - ctlr = kzalloc(sizeof(*ctlr), GFP_KERNEL); + ctlr = devm_kzalloc(params->dev, sizeof(*ctlr), GFP_KERNEL); if (!ctlr) return NULL; @@ -468,7 +463,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr) cpdma_desc_pool_destroy(ctlr->pool); spin_unlock_irqrestore(>lock, flags); - kfree(ctlr); return ret; } EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy); @@ -507,21 +501,22 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num, cpdma_handler_fn handler) { struct cpdma_chan *chan; - int ret, offset = (chan_num % CPDMA_MAX_CHANNELS) * 4; + int offset = (chan_num % CPDMA_MAX_CHANNELS) * 4; unsigned long flags; if (__chan_linear(chan_num) >= ctlr->num_chan) return NULL; - ret = -ENOMEM; - chan = kzalloc(sizeof(*chan), GFP_KERNEL); + chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL); if (!chan) - goto err_chan_alloc; + return ERR_PTR(-ENOMEM); spin_lock_irqsave(>lock, flags); - ret = -EBUSY; - if (ctlr->channels[chan_num]) - goto err_chan_busy; + if (ctlr->channels[chan_num]) { + spin_unlock_irqrestore(>lock, flags); + devm_kfree(ctlr->dev, chan); + return ERR_PTR(-EBUSY); + } chan->ctlr = ctlr; chan->state = CPDMA_STATE_IDLE; @@ -551,12 +546,6 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num, ctlr->channels[chan_num] = chan; spin_unlock_irqrestore(>lock, flags); return chan; - -err_chan_busy: - spin_unlock_irqrestore(>lock, flags); - kfree(chan); -err_chan_alloc: - return ERR_PTR(ret); } EXPORT_SYMBOL_GPL(cpdma_chan_create); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)
On 05/11/2014 09:12 PM, Benjamin Herrenschmidt wrote: On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote: Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the allyesconfig build by moving machine_check_common to a different location. While this fixes most of the errors, both allmodconfig and allyesconfig still fail as follows. arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards Fix by moving machine_check_common after the offending address. This suffers from the same problem as previous attempts, on some of my test configs I get: arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: R_PPC64_REL14 against `.text'+1c90 make[1]: *** [vmlinux] Error 1 make: *** [sub-make] Error 2 IE, it breaks currently working configs. Oh well, it was worth a try. Can you give me an example for a failing configuration ? Thanks, Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver
On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote: > This patch adds pincontrol driver for Freescale i.MX1 SOCs. > > Signed-off-by: Alexander Shiyan > --- > drivers/pinctrl/Kconfig| 7 ++ > drivers/pinctrl/Makefile | 1 + > drivers/pinctrl/pinctrl-imx1.c | 279 > + > 3 files changed, 287 insertions(+) > create mode 100644 drivers/pinctrl/pinctrl-imx1.c Nice. I thought about adding devicetree support for i.MX1 aswell. Don't we need a imx1-pinfunc.h file to make use of this patch? Sascha -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: icmp: account for ICMP out errors because of socket limit
Hi, Eric && David This patch is similar to the following patch. commit 1f8438a853667d48055ad38384c63e94b32c6578 Author: Eric Dumazet Date: Sat Apr 3 15:09:04 2010 -0700 icmp: Account for ICMP out errors When ip_append() fails because of socket limit or memory shortage, increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report these errors. LANG=C netstat -s | grep "ICMP messages failed" 0 ICMP messages failed For IPV6, implement ICMP6_MIB_OUTERRORS counter as well. # grep Icmp6OutErrors /proc/net/dev_snmp6/* /proc/net/dev_snmp6/eth0:Icmp6OutErrors 0 /proc/net/dev_snmp6/lo:Icmp6OutErrors 0 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Best Regards! Zhu Yanjun On 05/12/2014 11:19 AM, zhuyj wrote: Hi, Eric && David __ || | | | PC |<--->| MIPS 32 core | || |__| When ping from a PC to a board (MIPS 32 core), because of socket limit, ping echo will fail. But ICMP_MIB_OUTERRORS counter is not incremented. In this case, "netstat -s" can not report these errors. This patch will fix this problem. Now it is in the attachment. Please check it. Best Regards! Zhu Yanjun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] clk: samsung: out: Add infrastructure to register CLKOUT
On 05/10/2014 09:21 AM, Pankaj Dubey wrote: > On 05/09/2014 10:00 PM, Tushar Behera wrote: >> All SoC in Exynos-series have a clock with name XCLKOUT to provide >> debug information about various clocks available in the SoC. The register >> controlling the MUX and GATE of this clock is provided within PMU domain. >> Since PMU domain can't be dedicatedly mapped by every driver, the >> register >> needs to be handled through a regmap handle provided by PMU syscon >> controller. Right now, CCF doesn't allow regmap based MUX and GATE >> clocks, >> hence a dedicated clock provider for XCLKOUT is added here. >> >> Signed-off-by: Tushar Behera >> CC: Tomasz Figa >> --- >> drivers/clk/samsung/Makefile |2 +- >> drivers/clk/samsung/clk-out.c | 181 >> + >> drivers/clk/samsung/clk.h | 33 >> 3 files changed, 215 insertions(+), 1 deletion(-) >> create mode 100644 drivers/clk/samsung/clk-out.c >> [ ... ] >> +/** >> + * struct samsung_clkout_soc_data: SoC specific register details >> + * @reg: Offset of CLKOUT register from PMU base > > how about naming this variable as "offset" instead of "reg". > Okay, I will change that. [ ... ] >> +u8 samsung_clkout_get_parent(struct clk_hw *hw) >> +{ >> +struct samsung_clkout *clkout = to_clk_out(hw); >> +const struct samsung_clkout_soc_data *soc_data = clkout->soc_data; >> +unsigned int parent_mask = BIT(soc_data->mux_width) - 1; >> +unsigned int val; >> +int ret; >> + >> +ret = regmap_read(clkout->regmap, soc_data->reg, ); > > Do we really need to keep return value in "ret" as I can't see you are > using it anywhere? > Right, we are not using that and can be removed. >> + >> +return (val >> soc_data->mux_shift) & parent_mask; >> +} >> + [ ... ] >> +/* All existing Exynos serial of SoCs have common values for this >> offsets. */ > typo: serial/series/ Sure. Thanks for your review. -- Tushar Behera -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] Add framework to support clkout
On 05/10/2014 09:09 AM, Pankaj Dubey wrote: > Hi Tushar, > [ ... ] >> Also we need to find a suitable place to call early_syscon_init(), after >> the device tree has been unflattened and before clock initialization. >> >> While testing, I called this before of_clk_init() in >> arch/arm/kernel/time.c, >> but that place is too generic. Calling anywhere from exynos.c is not >> working ATM. > > IMO we do not need to, or if I am not wrong we should not change time.c. > The above solution is definitely a hack and just to test my stuff. The below solution looks good. > It's possible if we have exynos specific init_time with following changes. > FYI, In my patch series for Exynos PMU [1], currently I am handling this in > exynos_dt_machine_init. But definitely it can be handled as below and it > works > without any side effect and I have tested it. Only reason I do not > adopted this > as for Exynos PMU patch support I had other options. But if required and if > following change is acceptable I can include this in my next version of > Exynos > PMU patch series. > > [1]: https://lkml.org/lkml/2014/4/30/18 > > > +static void __init exynos_init_time(void) > +{ > +/* Nothing to do timer specific > + * as early_syscon_init requires DT to be unflattened and > + * system should be able to allocate memory we need to > + * postpone until init_time, but it should be done before > + * init_machine. Because before init_machine, secondary > + * core boot starts and it uses PMU registers. > + */ > + > +exynos_map_pmu(); > + Instead of calling early_syscon_init() from within exynos_map_pmu(), it would be good to call it explicitly here before exynos_map_pmu(). > +of_clk_init(NULL); > +clocksource_of_init(); > + > +} > + -- Tushar Behera -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator device node
> -Original Message- > From: Ferre, Nicolas > Sent: Friday, May 09, 2014 11:31 PM > To: Yang, Wenyou; Alexandre Belloni > Cc: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org; > robh...@kernel.org; broo...@kernel.org; linux-arm- > ker...@lists.infradead.org > Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator > device node > > On 22/04/2014 03:37, Yang, Wenyou : > > Hi, > > > >> -Original Message- > >> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com] > >> Sent: Monday, April 21, 2014 8:22 PM > >> To: Yang, Wenyou > >> Cc: devicet...@vger.kernel.org; Ferre, Nicolas; linux- > >> ker...@vger.kernel.org; robh...@kernel.org; broo...@kernel.org; > >> linux- arm-ker...@lists.infradead.org > >> Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the > >> regulator device node > >> > >> On 21/04/2014 at 11:54:43 +0200, Alexandre Belloni wrote : > >>> Hi, > >>> > >>> On 21/04/2014 at 12:29:07 +0800, Wenyou Yang wrote : > + > +vddana_reg: LDO_REG2 { > +regulator-name = > "VDDANA"; > + > regulator-min-microvolt > = > >> <330>; > + > regulator-max-microvolt > = > >> <330>; > + > regulator-always-on; > >>> > >>> I'm pretty sure that one is not always on as you actually have to > >>> configure it to get any voltage. Are you sure you want to set the > >>> regulator-always-on property here ? > >>> > >> > >> Just to clarify my though, wouldn't it be better to make the ADC > >> driver handle that regulator instead of using regulator-always-on ? > > Yes, you are right. > > It should not use regulator-always-on property for this regulator. > > It is ADC driver and ISI driver to handle it(The ISI takes PCK for > clock). > > Hi Wenyou and Alexandre, > > After talking to our system engineers, it not usual to avoid to power > the VDDANA rail. In fact it will prevent you to use all the pads that > are powered by VDDANA: PD20-PD31. Moreover, even if you do not activate > the ADC output on these lines you won't be able to use them as plain > GPIO... (Cf. package and pinout section of the datasheet). > > As the ADVREF pin of the SoC is connected to the VDDANA on this board > (even if this default configuration can be modified with a soldering > iron), we have to note that we may consume a little bit more power. > > But still, I would recommend to keep the "regulator-always-on" property > on this node. Do you agree and allow me to take your first revision of > the patch? I agree. > > > Bye, > -- > Nicolas Ferre Best Regards, Wenyou Yang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: powernow-k8: Suppress checkpatch warnings
On 11 May 2014 22:56, Stratos Karafotis wrote: > Suppress the following checkpatch.pl warnings: > > - WARNING: Prefer pr_err(... to printk(KERN_ERR ... > - WARNING: Prefer pr_info(... to printk(KERN_INFO ... > - WARNING: Prefer pr_warn(... to printk(KERN_WARNING ... > - WARNING: quoted string split across lines > - WARNING: please, no spaces at the start of a line > > Also, define the pr_fmt macro instead of PFX for the module name. > > Signed-off-by: Stratos Karafotis > --- > > Changes v1 -> v2 > - Use pr_err_once instead of printk_once > - Change missing_pss_msg to macro (because pr_err_once > doesn't compile otherwise) > - Put one pr_err message in a single line instead of two > - Ignore "line over 80 characters" warnings > - Change the word "Fix" in the subject of the patch to > "Suppress" as the patch doesn't really fix anything > > drivers/cpufreq/powernow-k8.c | 180 > +- > drivers/cpufreq/powernow-k8.h | 2 +- > 2 files changed, 74 insertions(+), 108 deletions(-) Acked-by: Viresh Kumar -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)
On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote: > Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the > allyesconfig build by moving machine_check_common to a different location. > While this fixes most of the errors, both allmodconfig and allyesconfig still > fail as follows. > > arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org > backwards > > Fix by moving machine_check_common after the offending address. This suffers from the same problem as previous attempts, on some of my test configs I get: arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: R_PPC64_REL14 against `.text'+1c90 make[1]: *** [vmlinux] Error 1 make: *** [sub-make] Error 2 IE, it breaks currently working configs. So we need to move more things around and I haven't had a chance to sort it out. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: cifs: new helper: file_inode(file)
merged into cifs-2.6.git for-next On Tue, Dec 10, 2013 at 9:02 PM, Libo Chen wrote: > > Signed-off-by: Libo Chen > --- > fs/cifs/ioctl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c > index 7749230..45cb59b 100644 > --- a/fs/cifs/ioctl.c > +++ b/fs/cifs/ioctl.c > @@ -85,7 +85,7 @@ static long cifs_ioctl_clone(unsigned int xid, struct file > *dst_file, > goto out_fput; > } > > - src_inode = src_file.file->f_dentry->d_inode; > + src_inode = file_inode(src_file.file); > > /* > * Note: cifs case is easier than btrfs since server responsible for > -- > 1.8.2.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.14.3 i915 dead display under X11
Hi all, I rarely upgrade kernels these days -- so when updating to 3.14.3, I found the X display was blank -- switching to a text console appears to work, but I still have to type blind. Symptoms: Text mode and KMS works correctly to come up with the text console. Running X (whether through xdm or /usr/bin/Xorg) causes the display to go blank and apparently turn off. Switching to a text console via Control-Alt-F# leaves a mostly blank screen up, but there are brief flashes where it looks like the contents of the text console gets rendered once every 5 seconds or so, but so fast no words or letters can be recognized. System: * Thinkpad R61 * 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c) * Debian/wheezy * xserver-xorg-video-intel 2:2.19.0-6 A diff of the Xorg.0.log (with timestamps removed, as it made almost every line show up in the diff) trimmed down shows: @@ -53,7 +53,7 @@ (==) |-->Input Device "" (==) The core keyboard device wasn't specified explicitly in the layout. Using the default keyboard configuration. - (II) Loader magic: 0x7f2813451ae0 + (II) Loader magic: 0x7f492c35aae0 (II) Module ABI versions: X.Org ANSI C Emulation: 0.4 X.Org Video Driver: 12.1 @@ -170,15 +170,17 @@ Sandybridge Server, Ivybridge Mobile (GT1), Ivybridge Mobile (GT2), Ivybridge Desktop (GT1), Ivybridge Desktop (GT2), Ivybridge Server, Ivybridge Server (GT2) - (--) using VT number 7 + (++) using VT number 7 + (WW) xf86OpenConsole: setpgid failed: Operation not permitted + (WW) xf86OpenConsole: setsid failed: Operation not permitted (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support drmOpenDevice: node name is /dev/dri/card0 - drmOpenDevice: open result is 10, (OK) + drmOpenDevice: open result is 8, (OK) drmOpenByBusid: Searching for BusID pci::00:02.0 drmOpenDevice: node name is /dev/dri/card0 - drmOpenDevice: open result is 10, (OK) - drmOpenByBusid: drmOpenMinor returns 10 + drmOpenDevice: open result is 8, (OK) + drmOpenByBusid: drmOpenMinor returns 8 drmOpenByBusid: drmGetBusid reports pci::00:02.0 (**) intel(0): Depth 16, (--) framebuffer bpp 16 (==) intel(0): RGB weight 565 @@ -387,7 +389,17 @@ (II) AutoAddDevices is off - not adding device. (II) config/udev: Adding input device ThinkPad Extra Buttons (/dev/input/event5) (II) AutoAddDevices is off - not adding device. - (II) AIGLX: Suspending AIGLX clients for VT switch - (II) UnloadModule: "kbd" - (II) UnloadModule: "mouse" - Server terminated successfully (0). Closing log file. + (II) intel(0): EDID vendor "LEN", prod id 16435 + (II) intel(0): Printing DDC gathered Modelines: + (II) intel(0): Modeline "1440x900"x0.0 97.78 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (55.6 kHz eP) + (II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz e) + (II) intel(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e) + (II) intel(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e) + (II) intel(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e) + (II) intel(0): EDID vendor "LEN", prod id 16435 + (II) intel(0): Printing DDC gathered Modelines: + (II) intel(0): Modeline "1440x900"x0.0 97.78 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (55.6 kHz eP) + (II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz e) + (II) intel(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz e) + (II) intel(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz e) + (II) intel(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz e) Bisecting from 3.13.6 (good) to 3.14.3 (bad) ended up with... commit b35684b8fa94e04f55fd38bf672b737741d2f9e2 Author: Jani Nikula Date: Thu Nov 14 12:13:41 2013 +0200 drm/i915: do full backlight setup at enable time We should now have all the information we need to do a full initialization of the backlight registers. v2: Keep QUIRK_NO_PCH_PWM_ENABLE for now (Imre). Signed-off-by: Jani Nikula Reviewed-by: Imre Deak Signed-off-by: Daniel Vetter Which is in 3.12.0 I'm not sure how that came to be. Does that look right? What other information would be required to track this down? Thanks, -- DN Daniel Nobuto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch_random_refill
On 05/11/2014 08:36 PM, Stephan Mueller wrote: > > But in our current predicament, not everybody trusts a few potentially easily > manipulated gates that have no other purpose than produce white noise which > are developed by the biggest chip vendor in the US. Gates which have other > purposes may not be that easily manipulated. > Incidentally, I disagree with the "easily manipulated" bit. Yes, I have seen the paper which says that you can do it in such a way that it doesn't show up on *visual* examination. However, put an electrical probe on it and it shows up immediately. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch_random_refill
On 05/11/2014 08:36 PM, Stephan Mueller wrote: > > Ohh, ok, thanks for fixing that. :-) > > Though what makes me wonder is the following: why are some RNGs forced to use > the hw_random framework whereas some others are not? What is the driver for > that? > > The current state of random.c vs. drivers/char/hw_random and the strong in- > kernel separation between both makes me wonder. Isn't that all kind of > inconsistent? > The main differences are speed of access, trivial interface, and architectural guarantees. You also don't have to deal with enumeration, DMA engines, interrupts, indirect access, or bus drivers, which all are utterly unacceptable on a synchronous path. That being said, it is getting clear that we most likely would be better off with the kernel directly feeding from at least a subset of the hw_random drivers, rather than waiting for user space to come along and launch a daemon... after $DEITY knows how many other processes have already been launched. There are patches being worked on to make that happen, although there are a fair number of potential issues, including the fact that some of the hw_random drivers are believed to be dodgy -- for example, the TPM driver: some TPMs are believed to not contain any entropy element and simply rely on a factory-seeded nonvolatile counter (since the TPM has to have support for nonvolatile counters anyway, this hardware is already present.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging/lustre: fix sparse warnings in o2iblnd_cb.c
This patch fixes the following sparse warnings: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:44:1: warning: symbol 'kiblnd_tx_done' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:102:10: warning: symbol 'kiblnd_get_idle_tx' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:131:1: warning: symbol 'kiblnd_drop_rx' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:212:10: warning: symbol 'kiblnd_find_waiting_tx_locked' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:238:1: warning: symbol 'kiblnd_handle_completion' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:277:1: warning: symbol 'kiblnd_send_completion' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:296:1: warning: symbol 'kiblnd_handle_rx' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:457:1: warning: symbol 'kiblnd_rx_complete' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:527:13: warning: symbol 'kiblnd_kvaddr_to_page' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:699:1: warning: symbol 'kiblnd_setup_rd_iov' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:752:1: warning: symbol 'kiblnd_setup_rd_kiov' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:792:1: warning: symbol 'kiblnd_post_tx_locked' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:996:1: warning: symbol 'kiblnd_tx_complete' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1270:1: warning: symbol 'kiblnd_connect_peer' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1627:1: warning: symbol 'kiblnd_reply' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1814:1: warning: symbol 'kiblnd_thread_fini' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1828:1: warning: symbol 'kiblnd_peer_notify' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1934:1: warning: symbol 'kiblnd_handle_early_rxs' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1957:1: warning: symbol 'kiblnd_abort_txs' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1993:1: warning: symbol 'kiblnd_finalise_conn' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2167:1: warning: symbol 'kiblnd_reject' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2178:1: warning: symbol 'kiblnd_passive_connect' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2452:1: warning: symbol 'kiblnd_reconnect' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2516:1: warning: symbol 'kiblnd_rejected' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2655:1: warning: symbol 'kiblnd_check_connreply' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2754:1: warning: symbol 'kiblnd_active_connect' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3025:1: warning: symbol 'kiblnd_check_conns' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3108:1: warning: symbol 'kiblnd_disconnect_conn' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3247:1: warning: symbol 'kiblnd_complete' was not declared. Should it be static? drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:904:20: warning: context imbalance in 'kiblnd_post_tx_locked' - unexpected unlock Signed-off-by: Zi Shen Lim --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 60 +++--- 1 file changed, 31 insertions(+), 29 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 9bf6c94..dfd16e7 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -40,7 +40,7 @@ #include "o2iblnd.h" -void +static void kiblnd_tx_done (lnet_ni_t *ni, kib_tx_t *tx) { lnet_msg_t *lntmsg[2]; @@ -99,7 +99,7 @@ kiblnd_txlist_done (lnet_ni_t *ni, struct list_head *txlist, int status) } } -kib_tx_t * +static kib_tx_t *
Re: arch_random_refill
Am Sonntag, 11. Mai 2014, 20:22:28 schrieb H. Peter Anvin: Hi Peter, > > > Note, I do not see an issue with the patch that adds RDSEED as part of > > add_interrupt_randomness outlined in [2]. The reason is that this patch > > does not monopolizes the noise sources. > > > > I do not want to imply that Intel (or any other chip manufacturer that > > will > > hook into arch_random_refill) intentionally provides bad entropy (and this > > email shall not start a discussion about entropy again), but I would like > > to be able to only use noise sources that I can fully audit. As it is > > with hardware, I am not able to see what it is doing. > > I have to point out the irony in this given your previous proposals, > however... I guess that is the funny nature of entropy :-) But in our current predicament, not everybody trusts a few potentially easily manipulated gates that have no other purpose than produce white noise which are developed by the biggest chip vendor in the US. Gates which have other purposes may not be that easily manipulated. > > > Thus, may I ask that arch_random_refill is revised such that it will not > > monopolize the noise sources? If somebody wants that, he can easily use > > rngd. > Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the > "nordrand" option to the kernel. These options are there for a reason. > > Now when you mention it, though, the nordrand option should turn off > RDSEED as well as RDRAND. It currently doesn't; that is a bug, plain > and simple. Ohh, ok, thanks for fixing that. :-) Though what makes me wonder is the following: why are some RNGs forced to use the hw_random framework whereas some others are not? What is the driver for that? The current state of random.c vs. drivers/char/hw_random and the strong in- kernel separation between both makes me wonder. Isn't that all kind of inconsistent? Ciao Stephan -- | Cui bono? | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 0/2] remap_file_pages() decommission
Armin Rigo writes: > Here is a note from the PyPy project (mentioned earlier in this > thread, and at https://lwn.net/Articles/587923/ ). Your use is completely bogus. remap_file_pages() pins everything and disables any swapping for the area. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmemcheck: got WARNING when dynamicly adjust /proc/sys/kernel/kmemcheck to 0/1
On 2014/5/9 18:02, Vegard Nossum wrote: > On 05/09/2014 11:52 AM, Xishi Qiu wrote: >> On 2014/5/9 15:57, Xishi Qiu wrote: >> >>> OS boot with kmemcheck=0, then set 1, do something, set 0, do something, >>> set 1... >>> then I got the WARNING log. Does kmemcheck support dynamicly adjust? >>> >>> Thanks, >>> Xishi Qiu >>> >>> [ 20.200305] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow >>> Control: RX >>> [ 20.208652] ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 20.216504] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> [ 22.647385] auditd (3116): /proc/3116/oom_adj is deprecated, please use >>> /proc/3116/oom_score_adj instead. >>> [ 24.845214] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found >>> [ 30.434764] eth0: no IPv6 routers present >>> [ 340.154608] NOHZ: local_softirq_pending 01 >>> [ 340.154639] WARNING: kmemcheck: Caught 64-bit read from uninitialized >>> memory (88083f43a550) >>> [ 340.154644] >>> c20080ff5d0100c9400ed34e0888 >>> [ 340.154667] u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u >>> u u >>> [ 340.154687] ^ >>> [ 340.154690] >>> [ 340.154694] Pid: 3, comm: ksoftirqd/0 Tainted: G C >>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal >>> RH2285 V2-24S/BC11SRSC1 >>> [ 340.154702] RIP: 0010:[] [] >>> d_namespace_path+0x132/0x270 >>> [ 340.154714] RSP: 0018:8808515a1c88 EFLAGS: 00010202 >>> [ 340.154718] RAX: 88083f43a540 RBX: 880852e718f3 RCX: >>> 0001 >>> [ 340.154721] RDX: 8808515a1d28 RSI: RDI: >>> 881053855a60 >>> [ 340.154725] RBP: 8808515a1ce8 R08: 8808515a1c50 R09: >>> 880852e75800 >>> [ 340.154728] R10: 000156f0 R11: R12: >>> 0001 >>> [ 340.154731] R13: 0100 R14: 880852e71510 R15: >>> 880852e71800 >>> [ 340.154736] FS: () GS:88085f60() >>> knlGS: >>> [ 340.154740] CS: 0010 DS: ES: CR0: 8005003b >>> [ 340.154743] CR2: 880852e71570 CR3: 0008513f2000 CR4: >>> 000407f0 >>> [ 340.154746] DR0: DR1: DR2: >>> >>> [ 340.154750] DR3: DR6: 4ff0 DR7: >>> 0400 >>> [ 340.154753] [] aa_path_name+0x85/0x180 >>> [ 340.154758] [] apparmor_bprm_set_creds+0x126/0x520 >>> [ 340.154763] [] security_bprm_set_creds+0xe/0x10 >>> [ 340.154771] [] prepare_binprm+0xa5/0x100 >>> [ 340.154777] [] do_execve_common+0x232/0x430 >>> [ 340.154781] [] do_execve+0x3a/0x40 >>> [ 340.154785] [] sys_execve+0x49/0x70 >>> [ 340.154793] [] stub_execve+0x6c/0xc0 >>> [ 340.154801] [] 0x >>> [ 340.154813] WARNING: kmemcheck: Caught 64-bit read from uninitialized >>> memory (88083f43a570) >>> [ 340.154817] >>> 746f730078a5433f0888f86d433f0888746f7073 >>> [ 340.154839] u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u >>> u u >>> [ 340.154858] ^ >>> [ 340.154861] >>> [ 340.154864] Pid: 3, comm: ksoftirqd/0 Tainted: G C >>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal >>> RH2285 V2-24S/BC11SRSC1 >>> [ 340.154871] RIP: 0010:[] [] >>> rw_verify_area+0x24/0x100 >>> [ 340.154880] RSP: 0018:8808515a1dc8 EFLAGS: 00010202 >>> [ 340.154883] RAX: 88083f43a540 RBX: 0080 RCX: >>> 0080 >>> [ 340.154887] RDX: 8808515a1e30 RSI: 880852e71500 RDI: >>> >>> [ 340.154890] RBP: 8808515a1de8 R08: 880852e73200 R09: >>> 88085f004900 >>> [ 340.154894] R10: 880852e72600 R11: R12: >>> 880852e71500 >>> [ 340.154897] R13: R14: 880852e73200 R15: >>> 0001 >>> [ 340.154901] FS: () GS:88085f60() >>> knlGS: >>> [ 340.154905] CS: 0010 DS: ES: CR0: 8005003b >>> [ 340.154908] CR2: 880852e71570 CR3: 0008513f2000 CR4: >>> 000407f0 >>> [ 340.154911] DR0: DR1: DR2: >>> >>> [ 340.154914] DR3: DR6: 4ff0 DR7: >>> 0400 >>> [ 340.154917] [] vfs_read+0xa4/0x130 >>> [ 340.154922] [] kernel_read+0x44/0x60 >>> [ 340.154926] [] prepare_binprm+0xd0/0x100 >>> [ 340.154931] [] do_execve_common+0x232/0x430 >>> [ 340.154935] [] do_execve+0x3a/0x40 >>> [ 340.154939] [] sys_execve+0x49/0x70 >>> [ 340.154944] [] stub_execve+0x6c/0xc0 >>> [ 340.154950] [] 0x >>> [ 340.154955] WARNING: kmemcheck: Caught 32-bit read from uninitialized >>> memory (88083f43a540) >>> [ 340.154959] >>> c20080ff5d0100c9400ed34e0888 >>> [ 340.154981] u u u u u u u u u u u u u u u u i i i
[tip:x86/urgent] x86, rdrand: When nordrand is specified, disable RDSEED as well
Commit-ID: 7a5091d58419b4e5222abce58a40c072786ea1d6 Gitweb: http://git.kernel.org/tip/7a5091d58419b4e5222abce58a40c072786ea1d6 Author: H. Peter Anvin AuthorDate: Sun, 11 May 2014 20:25:20 -0700 Committer: H. Peter Anvin CommitDate: Sun, 11 May 2014 20:25:20 -0700 x86, rdrand: When nordrand is specified, disable RDSEED as well One can logically expect that when the user has specified "nordrand", the user doesn't want any use of the CPU random number generator, neither RDRAND nor RDSEED, so disable both. Reported-by: Stephan Mueller Cc: Theodore Ts'o Link: http://lkml.kernel.org/r/21542339.0lfnpsy...@myon.chronox.de Signed-off-by: H. Peter Anvin --- Documentation/kernel-parameters.txt | 8 arch/x86/kernel/cpu/rdrand.c| 1 + 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4384217..30a8ad0d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2218,10 +2218,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. noreplace-smp [X86-32,SMP] Don't replace SMP instructions with UP alternatives - nordrand[X86] Disable the direct use of the RDRAND - instruction even if it is supported by the - processor. RDRAND is still available to user - space applications. + nordrand[X86] Disable kernel use of the RDRAND and + RDSEED instructions even if they are supported + by the processor. RDRAND and RDSEED are still + available to user space applications. noresume[SWSUSP] Disables resume and restores original swap space. diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c index 384df51..136ac74 100644 --- a/arch/x86/kernel/cpu/rdrand.c +++ b/arch/x86/kernel/cpu/rdrand.c @@ -27,6 +27,7 @@ static int __init x86_rdrand_setup(char *s) { setup_clear_cpu_cap(X86_FEATURE_RDRAND); + setup_clear_cpu_cap(X86_FEATURE_RDSEED); return 1; } __setup("nordrand", x86_rdrand_setup); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag
On Sun, May 11, 2014 at 1:06 AM, Fabian Frederick wrote: > generic_file_fsync has been updated to issue a flush for > older filesystems. > > This patch tests for barrier flag in ext4 mount flags > and calls the right function. > > Suggested-by: Jan Kara > Suggested-by: Christoph Hellwig > Cc: Jan Kara > Cc: Christoph Hellwig > Cc: Alexander Viro > Cc: "Theodore Ts'o" > Cc: Andrew Morton > Signed-off-by: Fabian Frederick > --- > fs/ext4/fsync.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index a8bc47f..fa82c0a 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > > if (!journal) { > ret = generic_file_fsync(file, start, end, datasync); Forget to remove above line? > + if (test_opt(inode->i_sb, BARRIER)) > + ret = generic_file_fsync(file, start, end, datasync); > + else > + ret = __generic_file_fsync(file, start, end, > datasync); > if (!ret && !hlist_empty(>i_dentry)) > ret = ext4_sync_parent(inode); > goto out; Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 2/2] ARM: ioremap: Add IO mapping space reused support.
For the IO mapping, for the same physical address space maybe mapped more than one time, for example, in some SoCs: 0x2000 ~ 0x20001000: are global control IO physical map, and this range space will be used by many drivers. And then if each driver will do the same ioremap operation, we will waste to much malloc virtual spaces. This patch add IO mapping space reused support. Signed-off-by: Richard Lee --- arch/arm/mm/ioremap.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index f9c32ba..26a3744 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -260,7 +260,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, { const struct mem_type *type; int err; - unsigned long addr; + unsigned long addr, off; struct vm_struct *area; phys_addr_t paddr = __pfn_to_phys(pfn); @@ -301,6 +301,12 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, if (WARN_ON(pfn_valid(pfn))) return NULL; + area = find_vm_area_paddr(paddr, size, , VM_IOREMAP); + if (area) { + addr = (unsigned long)area->addr; + return (void __iomem *)(offset + off + addr); + } + area = get_vm_area_caller(size, VM_IOREMAP, caller); if (!area) return NULL; @@ -410,6 +416,9 @@ void __iounmap(volatile void __iomem *io_addr) if (svm) return; + if (!vm_area_is_aready_to_free((unsigned long)addr)) + return; + #if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE) { struct vm_struct *vm; -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: arch_random_refill
On 05/11/2014 04:01 PM, Stephan Mueller wrote: > Hi Peter, > > some time back when the RDRAND instruction was debated, a patch was offered > for driver/char/random.c that in essence turned /dev/random into a frontend > for RDRAND in case that instruction was available. The patch kind of > monopolized the noise sources such that if a user space "random number hog" > pulled data from /dev/random endlessly, the (almost) only noise source was > RDRAND. As that patch treated RDRAND to provide entropy, the blocking > behavior > went away for /dev/random. > This is false in a number of ways. First of all... we NEVER pulled either /dev/random or /dev/urandom directly from RDRAND. We used RDRAND directly for kernel internal randomness uses. Users did object to this. > That patch did not sit well with some developers and it got finally changed > such that the output of RDRAND is now just XORed with the output of the > "classic" /dev/random behavior -- /dev/random is still blocking. Mixing in RDRAND into /dev/random and /dev/urandom is actually > With the current development cycle for 3.15, the function arch_random_refill > is added as presented in [1]. It now uses RDSEED instead of RDRAND. Yet, the > way this function is called in random_read seems (as I have no system with an > RDSEED, I cannot test) to show the very same behavior as the aforementioned > RDRAND patch: the blocking behavior of /dev/random will be gone and RDSEED > will monopolize the noise sources in case of a user space hog. There is a huge difference between this and what people objected to earlier: we filter everything through the kernel random number pool system, which would require a herculean mathematical effort to reverse even if the output of RDSEED was 100% predictable. > Note, I do not see an issue with the patch that adds RDSEED as part of > add_interrupt_randomness outlined in [2]. The reason is that this patch does > not monopolizes the noise sources. > > I do not want to imply that Intel (or any other chip manufacturer that will > hook into arch_random_refill) intentionally provides bad entropy (and this > email shall not start a discussion about entropy again), but I would like to > be able to only use noise sources that I can fully audit. As it is with > hardware, I am not able to see what it is doing. I have to point out the irony in this given your previous proposals, however... > Thus, may I ask that arch_random_refill is revised such that it will not > monopolize the noise sources? If somebody wants that, he can easily use rngd. Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the "nordrand" option to the kernel. These options are there for a reason. Now when you mention it, though, the nordrand option should turn off RDSEED as well as RDRAND. It currently doesn't; that is a bug, plain and simple. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 0/2] Add IO mapping space reused support
Richard Lee (2): mm/vmalloc: Add IO mapping space reused interface. ARM: ioremap: Add IO mapping space reused support. arch/arm/mm/ioremap.c | 11 - include/linux/vmalloc.h | 5 mm/vmalloc.c| 63 + 3 files changed, 78 insertions(+), 1 deletion(-) -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
icmp: account for ICMP out errors because of socket limit
Hi, Eric && David __ || | | | PC |<--->| MIPS 32 core | || |__| When ping from a PC to a board (MIPS 32 core), because of socket limit, ping echo will fail. But ICMP_MIB_OUTERRORS counter is not incremented. In this case, "netstat -s" can not report these errors. This patch will fix this problem. Now it is in the attachment. Please check it. Best Regards! Zhu Yanjun >From af800d0f123cf9c66a9ae167baa7dc1d25d0cd1f Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Mon, 12 May 2014 11:07:20 +0800 Subject: [PATCH 1/1] icmp: account for ICMP out errors because of socket limit When icmp_xmit_lock fails because of socket limit or memory shortage, increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report these errors. netstat -s | grep "ICMP messages failed" 0 ICMP messages failed Signed-off-by: Zhu Yanjun --- net/ipv4/icmp.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 0134663..9a0bd7c 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -342,8 +342,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) return; sk = icmp_xmit_lock(net); - if (sk == NULL) + if (sk == NULL){ + ICMP_INC_STATS_BH(net, ICMP_MIB_OUTERRORS); return; + } inet = inet_sk(sk); icmp_param->data.icmph.checksum = 0; -- 1.7.9.5
Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
On Mon, 2014-05-12 at 10:08 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2014-05-12 at 01:33 +0200, Frederic Weisbecker wrote: > > We are going to extend irq work to support remote queuing. > > > > So lets add a cpu argument to arch_irq_work_raise(). The architectures > > willing to support that must then provide the backend to raise irq work > > IPIs remotely. > > > > Initial support is provided for x86 and ARM since they are easily > > extended. The other archs that overwrite arch_irq_work_raise() seem > > to use local clock interrupts and therefore need deeper rewrite of their > > irq work support to implement remote raising. > > Well, looks like it's time to turn it into an IPI... It gets a bit more > tricky because whether whacking the interrupt controller is safe to > do from an NMI is safe or not might depend on that irq controller > implementation... > > It looks like XICS and MPIC should be safe though, so at least we > should be able to cover ppc64, but I'll leave ppc32 alone. Correction... that's actually a bit more tricky. We might need an MMIO to trigger the IPI. That means potentially having to take a hash miss, and we certainly can't do that at NMI time at the moment. We *could* hard disable interrupts (which blocks our NMIs since they arent't real NMIs, they are just a way to bypass our soft-disable state for perf interrupts) for hash_page, but that still makes me somewhat nervous. Another option would be to add an ioremap flag of some description to be able to install bolted hash entries. (It already does so if called early enough during boot, so it might actually just work by accident but that's an undebuggable horror show waiting to happen if we ever change that). So needs a bit more thinking on our side. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 1/2] mm/vmalloc: Add IO mapping space reused interface.
For the IO mapping, for the same physical address space maybe mapped more than one time, for example, in some SoCs: 0x2000 ~ 0x20001000: are global control IO physical map, and this range space will be used by many drivers. And then if each driver will do the same ioremap operation, we will waste to much malloc virtual spaces. This patch add the IO mapping space reusing interface: - find_vm_area_paddr: used to find the exsit vmalloc area using the IO physical address. - vm_area_is_aready_to_free: before vfree the IO mapped areas using this to do the check that if this area is used by more than one consumer. Signed-off-by: Richard Lee --- include/linux/vmalloc.h | 5 mm/vmalloc.c| 63 + 2 files changed, 68 insertions(+) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 4b8a891..2b811f6 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -34,6 +34,7 @@ struct vm_struct { struct page **pages; unsigned intnr_pages; phys_addr_t phys_addr; + unsigned intused; const void *caller; }; @@ -100,6 +101,10 @@ static inline size_t get_vm_area_size(const struct vm_struct *area) return area->size - PAGE_SIZE; } +extern int vm_area_is_aready_to_free(phys_addr_t addr); +struct vm_struct *find_vm_area_paddr(phys_addr_t paddr, size_t size, +unsigned long *offset, +unsigned long flags); extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags); extern struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, const void *caller); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index bf233b2..f75b7b3 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1293,6 +1293,7 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, vm->addr = (void *)va->va_start; vm->size = va->va_end - va->va_start; vm->caller = caller; + vm->used = 1; va->vm = vm; va->flags |= VM_VM_AREA; spin_unlock(_area_lock); @@ -1383,6 +1384,68 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, NUMA_NO_NODE, GFP_KERNEL, caller); } +int vm_area_is_aready_to_free(phys_addr_t addr) +{ + struct vmap_area *va; + + va = find_vmap_area((unsigned long)addr); + if (!va || !(va->flags & VM_VM_AREA) || !va->vm) + return 1; + + if (va->vm->used <= 1) + return 1; + + --va->vm->used; + + return 0; +} + +/** + * find_vm_area_paddr - find a continuous kernel virtual area using the + * physical addreess. + * @paddr: base physical address + * @size: size of the physical area range + * @offset:the start offset of the vm area + * @flags: %VM_IOREMAP for I/O mappings + * + * Search for the kernel VM area, whoes physical address starting at @paddr, + * and if the exsit VM area's size is large enough, then just return it, or + * return NULL. + */ +struct vm_struct *find_vm_area_paddr(phys_addr_t paddr, size_t size, +unsigned long *offset, +unsigned long flags) +{ + struct vmap_area *va; + + if (!(flags & VM_IOREMAP)) + return NULL; + + rcu_read_lock(); + list_for_each_entry_rcu(va, _area_list, list) { + phys_addr_t phys_addr; + + if (!va || !(va->flags & VM_VM_AREA) || !va->vm) + continue; + + phys_addr = va->vm->phys_addr; + + if (paddr < phys_addr || paddr + size > phys_addr + va->vm->size) + continue; + + *offset = paddr - phys_addr; + + if (va->vm->flags & VM_IOREMAP && va->vm->size >= size) { + va->vm->used++; + rcu_read_unlock(); + return va->vm; + } + } + rcu_read_unlock(); + + return NULL; +} + /** * find_vm_area - find a continuous kernel virtual area * @addr: base address -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] ipc/sem.c: Fix semctl(,,{GETNCNT,GETZCNT})
On Sat, 2014-05-10 at 12:03 +0200, Manfred Spraul wrote: > Hi all, > > According to the man page of semop(), semzcnt or semncnt are increased > exactly for the operation that couldn't proceed. > > The Linux implementation always tried to be clever and to increase the > counters > for all operations that might be the reason why a task sleeps. ... and I hate the fact that we do so on demand, instead of modifying the values when semop is called. This makes the whole semctl calls less accurate, and in fact its mentioned in the code. > The following patches fix that and make the code conform to the > documentation. > > The series got fairly long, because I also noticed that semzcnt was calculated > incorrectly. > > What do you think? I'm still going through the changes, sems make my brain hurt. But conceptually they do make sense... and hey, if semctl(GETNCNT,GETZCNT) calls are currently incomplete, then yeah, we should fix it. > I ran a few test cases, and the semncnt and semzcnt counts now match > the expectation. > > Is anyone aware of an application that uses GETNCNT or GETZCNT? Given how Oracle uses sysv semaphores I wouldn't be surprised if they make use of these, specially GETNCNT, for something like "get the amount of waiters" as opposed to "are there waiters"... but I'm just speculating here. I did find that LTP does some calls to GETZNCT, GETNCNT, and these patches do not break those tests. However, they are pretty bogus since they always test for zero. That reminds me, it might be worthwhile adding some more tests in the selftests/ipc dir, we only have some trivial msgq program, for the rest I pretty much rely on LTP for correctness runs. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 000/143] 2.6.32.62-longterm review
This is the start of the longterm review cycle for the 2.6.32.62 release. All patches will be posted as a response to this one. If anyone has any issue with these being applied, please let me know. If anyone is a maintainer of the proper subsystem, and wants to add a Signed-off-by: line to the patch, please respond with it. Responses should be made before Friday 16th 8PM UTC. Anything received after that time might be too late. If someone wants a bit more time for a deeper review, please let me know. The whole patch series can be found in one patch at : kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.62-rc1.gz The shortlog and diffstat are appended below. -- Andreas Henriksson (1): net: Fix "ip rule delete table 256" Andy Honig (2): KVM: Improve create VCPU parameter (CVE-2013-4587) KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) Ben Greear (1): Fix lockup related to stop_machine being stuck in __do_softirq. Changli Gao (2): net: Swap ver and type in pppoe_hdr net: drop_monitor: fix the value of maxattr Chris Healy (1): resubmit bridge: fix message_age_timer calculation Dan Carpenter (13): cciss: fix info leak in cciss_ioctl32_passthru() cpqarray: fix info leak in ida_locked_ioctl() net: heap overflow in __audit_sockaddr() arcnet: cleanup sizeof parameter af_key: more info leaks in pfkey messages net_sched: info leak in atm_tc_dump_class() isdnloop: use strlcpy() instead of strcpy() net: clamp ->msg_namelen instead of returning an error isdnloop: several buffer overflows libertas: potential oops in debugfs uml: check length in exitcode_proc_write() xfs: underflow bug in xfs_attrlist_by_handle() aacraid: missing capable() check in compat ioctl Daniel Borkmann (8): net: sctp: fix NULL pointer dereference in socket destruction packet: packet_getname_spkt: make sure string is always 0-terminated random32: fix off-by-one in seeding requirement net: llc: fix use after free in llc_ui_recvmsg net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages Dave Kleikamp (1): sunvnet: vnet_port_remove must call unregister_netdev David S. Miller (1): net_sched: Fix stack info leak in cbq_dump_wrr(). Ding Tianhong (1): bridge: flush br's address entry in fdb when remove the bridge dev Duan Jiong (1): ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv Eric Dumazet (12): ipv6: ip6_sk_dst_check() must not assume ipv6 dst ipv6: tcp: fix panic in SYN processing tcp: must unclone packets before mangling them net: do not call sock_put() on TIMEWAIT sockets tcp: fix tcp_md5_hash_skb_data() ipv6: fix possible crashes in ip6_cork_release() ip_tunnel: fix kernel panic with icmp_dest_unreach neighbour: fix a race in neigh_destroy() vlan: fix a race in egress prio management tcp: cubic: fix bug in bictcp_acked() ipv4: fix possible seqlock deadlock inet: fix possible seqlock deadlocks Fan Du (1): sctp: Use software crc32 checksum when xfrm transform will happen. Florian Westphal (1): net: rose: restore old recvmsg behavior Hannes Frederic Sowa (12): ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match ipv6: remove max_addresses check from ipv6_create_tempaddr ipv6: drop packets with multiple fragmentation headers inet: prevent leakage of uninitialized memory to user in recv syscalls net: rework recvmsg handler msg_name and msg_namelen logic net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage) inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions ipv6: fix leaking uninitialized port number of offender sockaddr ipv6: fix possible seqlock deadlock in ip6_finish_output2 ipv6: udp packets following an UFO enqueued packet need also be handled by UFO inet: fix possible memory corruption with UDP_CORK and UFO ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data Ian Abbott (1): staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice Jason Wang (1): virtio-net: alloc big buffers also when guest can receive UFO Jiri Bohac (2): ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO bonding: 802.3ad: make aggregator_identifier bond-private Jitendra Bhivare (1): intel-iommu: Flush unmaps at domain_exit Jonathan Salwan (1): drivers/cdrom/cdrom.c: use kzalloc() for failing hardware Julian Anastasov (1): ipvs: fix CHECKSUM_PARTIAL for TCP, UDP
[ 030/143] proc connector: fix info leaks
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Mathias Krause [ Upstream commit e727ca82e0e9616ab4844301e6bae60ca7327682 ] Initialize event_data for all possible message types to prevent leaking kernel stack contents to userland (up to 20 bytes). Also set the flags member of the connector message to 0 to prevent leaking two more stack bytes this way. Cc: sta...@vger.kernel.org Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/connector/cn_proc.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index 6069790..3a2587a 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -59,6 +59,7 @@ void proc_fork_connector(struct task_struct *task) msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); get_seq(>seq, >cpu); ktime_get_ts(); /* get high res monotonic timestamp */ put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns); @@ -71,6 +72,7 @@ void proc_fork_connector(struct task_struct *task) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ /* If cn_netlink_send() failed, the data is not sent */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -87,6 +89,7 @@ void proc_exec_connector(struct task_struct *task) msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); get_seq(>seq, >cpu); ktime_get_ts(); /* get high res monotonic timestamp */ put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns); @@ -97,6 +100,7 @@ void proc_exec_connector(struct task_struct *task) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -113,6 +117,7 @@ void proc_id_connector(struct task_struct *task, int which_id) msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); ev->what = which_id; ev->event_data.id.process_pid = task->pid; ev->event_data.id.process_tgid = task->tgid; @@ -136,6 +141,7 @@ void proc_id_connector(struct task_struct *task, int which_id) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -151,6 +157,7 @@ void proc_sid_connector(struct task_struct *task) msg = (struct cn_msg *)buffer; ev = (struct proc_event *)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); get_seq(>seq, >cpu); ktime_get_ts(); /* get high res monotonic timestamp */ put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns); @@ -161,6 +168,7 @@ void proc_sid_connector(struct task_struct *task) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -176,8 +184,10 @@ void proc_exit_connector(struct task_struct *task) msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); get_seq(>seq, >cpu); ktime_get_ts(); /* get high res monotonic timestamp */ + memset(>event_data, 0, sizeof(ev->event_data)); put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns); ev->what = PROC_EVENT_EXIT; ev->event_data.exit.process_pid = task->pid; @@ -188,6 +198,7 @@ void proc_exit_connector(struct task_struct *task) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -211,6 +222,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack) msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); msg->seq = rcvd_seq; ktime_get_ts(); /* get high res monotonic timestamp */ put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns); @@ -220,6 +232,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack) memcpy(>id, _proc_event_id, sizeof(msg->id)); msg->ack = rcvd_ack + 1; msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } @@ -249,6
[ 005/143] ipvs: fix CHECKSUM_PARTIAL for TCP, UDP
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Julian Anastasov Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP, UDP not tested because it needs network card with HW CSUM support. May be fixes problem where IPVS can not be used in virtual boxes. Problem appears with DNAT to local address when the local stack sends reply in CHECKSUM_PARTIAL mode. Fix tcp_dnat_handler and udp_dnat_handler to provide vaddr and daddr in right order (old and new IP) when calling tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL). Signed-off-by: Julian Anastasov Signed-off-by: Simon Horman (cherry picked from commit 5bc9068e9d962ca6b8bec3f0eb6f60ab4dee1d04) Signed-off-by: Willy Tarreau --- net/netfilter/ipvs/ip_vs_proto_tcp.c | 10 +- net/netfilter/ipvs/ip_vs_proto_udp.c | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c index 91d28e0..d462b0d 100644 --- a/net/netfilter/ipvs/ip_vs_proto_tcp.c +++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c @@ -147,15 +147,15 @@ tcp_partial_csum_update(int af, struct tcphdr *tcph, #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) tcph->check = - csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, + ~csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, ip_vs_check_diff2(oldlen, newlen, - ~csum_unfold(tcph->check; + csum_unfold(tcph->check; else #endif tcph->check = - csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, + ~csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, ip_vs_check_diff2(oldlen, newlen, - ~csum_unfold(tcph->check; + csum_unfold(tcph->check; } @@ -269,7 +269,7 @@ tcp_dnat_handler(struct sk_buff *skb, * Adjust TCP checksums */ if (skb->ip_summed == CHECKSUM_PARTIAL) { - tcp_partial_csum_update(cp->af, tcph, >daddr, >vaddr, + tcp_partial_csum_update(cp->af, tcph, >vaddr, >daddr, htons(oldlen), htons(skb->len - tcphoff)); } else if (!cp->app) { diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c index e7a6885..c1781f5 100644 --- a/net/netfilter/ipvs/ip_vs_proto_udp.c +++ b/net/netfilter/ipvs/ip_vs_proto_udp.c @@ -154,15 +154,15 @@ udp_partial_csum_update(int af, struct udphdr *uhdr, #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) uhdr->check = - csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, + ~csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, ip_vs_check_diff2(oldlen, newlen, - ~csum_unfold(uhdr->check; + csum_unfold(uhdr->check; else #endif uhdr->check = - csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, + ~csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, ip_vs_check_diff2(oldlen, newlen, - ~csum_unfold(uhdr->check; + csum_unfold(uhdr->check; } @@ -205,7 +205,7 @@ udp_snat_handler(struct sk_buff *skb, * Adjust UDP checksums */ if (skb->ip_summed == CHECKSUM_PARTIAL) { - udp_partial_csum_update(cp->af, udph, >daddr, >vaddr, + udp_partial_csum_update(cp->af, udph, >vaddr, >daddr, htons(oldlen), htons(skb->len - udphoff)); } else if (!cp->app && (udph->check != 0)) { -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 007/143] staging: comedi: ni_65xx: (bug fix) confine insn_bits to one
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- subdevice From: Ian Abbott Commit 677a31565692d596ef42ea589b53ba289abf4713 upstream. The `insn_bits` handler `ni_65xx_dio_insn_bits()` has a `for` loop that currently writes (optionally) and reads back up to 5 "ports" consisting of 8 channels each. It reads up to 32 1-bit channels but can only read and write a whole port at once - it needs to handle up to 5 ports as the first channel it reads might not be aligned on a port boundary. It breaks out of the loop early if the next port it handles is beyond the final port on the card. It also breaks out early on the 5th port in the loop if the first channel was aligned. Unfortunately, it doesn't check that the current port it is dealing with belongs to the comedi subdevice the `insn_bits` handler is acting on. That's a bug. Redo the `for` loop to terminate after the final port belonging to the subdevice, changing the loop variable in the process to simplify things a bit. The `for` loop could now try and handle more than 5 ports if the subdevice has more than 40 channels, but the test `if (bitshift >= 32)` ensures it will break out early after 4 or 5 ports (depending on whether the first channel is aligned on a port boundary). (`bitshift` will be between -7 and 7 inclusive on the first iteration, increasing by 8 for each subsequent operation.) Signed-off-by: Ian Abbott Signed-off-by: Willy Tarreau --- drivers/staging/comedi/drivers/ni_65xx.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/staging/comedi/drivers/ni_65xx.c b/drivers/staging/comedi/drivers/ni_65xx.c index bbf75eb..bb23291 100644 --- a/drivers/staging/comedi/drivers/ni_65xx.c +++ b/drivers/staging/comedi/drivers/ni_65xx.c @@ -410,28 +410,25 @@ static int ni_65xx_dio_insn_bits(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { - unsigned base_bitfield_channel; - const unsigned max_ports_per_bitfield = 5; + int base_bitfield_channel; unsigned read_bits = 0; - unsigned j; + int last_port_offset = ni_65xx_port_by_channel(s->n_chan - 1); + int port_offset; + if (insn->n != 2) return -EINVAL; base_bitfield_channel = CR_CHAN(insn->chanspec); - for (j = 0; j < max_ports_per_bitfield; ++j) { - const unsigned port_offset = ni_65xx_port_by_channel(base_bitfield_channel) + j; - const unsigned port = - sprivate(s)->base_port + port_offset; - unsigned base_port_channel; + for (port_offset = ni_65xx_port_by_channel(base_bitfield_channel); +port_offset <= last_port_offset; port_offset++) { + unsigned port = sprivate(s)->base_port + port_offset; + int base_port_channel = port_offset * ni_65xx_channels_per_port; unsigned port_mask, port_data, port_read_bits; - int bitshift; - if (port >= ni_65xx_total_num_ports(board(dev))) + int bitshift = base_port_channel - base_bitfield_channel; + + if (bitshift >= 32) break; - base_port_channel = port_offset * ni_65xx_channels_per_port; port_mask = data[0]; port_data = data[1]; - bitshift = base_port_channel - base_bitfield_channel; - if (bitshift >= 32 || bitshift <= -32) - break; if (bitshift > 0) { port_mask >>= bitshift; port_data >>= bitshift; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 008/143] kernel/kmod.c: check for NULL in call_usermodehelper_exec()
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Tetsuo Handa If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer dereference happens upon core dump because argv_split("") returns argv[0] == NULL. This bug was once fixed by commit 264b83c07a84 ("usermodehelper: check subprocess_info->path != NULL") but was by error reintroduced by commit 7f57cfa4e2aa ("usermodehelper: kill the sub_info->path[0] check"). This bug seems to exist since 2.6.19 (the version which core dump to pipe was added). Depending on kernel version and config, some side effect might happen immediately after this oops (e.g. kernel panic with 2.6.32-358.18.1.el6). Signed-off-by: Tetsuo Handa Acked-by: Oleg Nesterov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 4c1c7be95c345cf2ad537a0c48e9aeadc7304527) Signed-off-by: Willy Tarreau --- kernel/kmod.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/kmod.c b/kernel/kmod.c index 8ecc509..3da09a9 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -560,6 +560,10 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, BUG_ON(atomic_read(_info->cred->usage) != 1); validate_creds(sub_info->cred); + if (!sub_info->path) { + call_usermodehelper_freeinfo(sub_info); + return -EINVAL; + } helper_lock(); if (sub_info->path[0] == '\0') goto out; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 024/143] HID: provide a helper for validating hid reports
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 331415ff16a12147d57d5c953f3a961b7ede348b upstream Many drivers need to validate the characteristics of their HID report during initialization to avoid misusing the reports. This adds a common helper to perform validation of the report exisitng, the field existing, and the expected number of values within the field. Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina [jmm: backported to 2.6.32] [wt: dev_err() in 2.6.32 instead of hid_err()] Signed-off-by: Willy Tarreau --- drivers/hid/hid-core.c | 58 ++ include/linux/hid.h| 4 2 files changed, 62 insertions(+) diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index a222cbb..e7e28b5 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -808,6 +808,64 @@ static __inline__ int search(__s32 *array, __s32 value, unsigned n) return -1; } +static const char * const hid_report_names[] = { + "HID_INPUT_REPORT", + "HID_OUTPUT_REPORT", + "HID_FEATURE_REPORT", +}; +/** + * hid_validate_values - validate existing device report's value indexes + * + * @device: hid device + * @type: which report type to examine + * @id: which report ID to examine (0 for first) + * @field_index: which report field to examine + * @report_counts: expected number of values + * + * Validate the number of values in a given field of a given report, after + * parsing. + */ +struct hid_report *hid_validate_values(struct hid_device *hid, + unsigned int type, unsigned int id, + unsigned int field_index, + unsigned int report_counts) +{ + struct hid_report *report; + + if (type > HID_FEATURE_REPORT) { + dev_err(>dev, "invalid HID report type %u\n", type); + return NULL; + } + + if (id >= HID_MAX_IDS) { + dev_err(>dev, "invalid HID report id %u\n", id); + return NULL; + } + + /* +* Explicitly not using hid_get_report() here since it depends on +* ->numbered being checked, which may not always be the case when +* drivers go to access report values. +*/ + report = hid->report_enum[type].report_id_hash[id]; + if (!report) { + dev_err(>dev, "missing %s %u\n", hid_report_names[type], id); + return NULL; + } + if (report->maxfield <= field_index) { + dev_err(>dev, "not enough fields in %s %u\n", + hid_report_names[type], id); + return NULL; + } + if (report->field[field_index]->report_count < report_counts) { + dev_err(>dev, "not enough values in %s %u field %u\n", + hid_report_names[type], id, field_index); + return NULL; + } + return report; +} +EXPORT_SYMBOL_GPL(hid_validate_values); + /** * hid_match_report - check if driver's raw_event should be called * diff --git a/include/linux/hid.h b/include/linux/hid.h index 481080d..e5db8e5 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -693,6 +693,10 @@ int hidinput_find_field(struct hid_device *hid, unsigned int type, unsigned int void hid_output_report(struct hid_report *report, __u8 *data); struct hid_device *hid_allocate_device(void); int hid_parse_report(struct hid_device *hid, __u8 *start, unsigned size); +struct hid_report *hid_validate_values(struct hid_device *hid, + unsigned int type, unsigned int id, + unsigned int field_index, + unsigned int report_counts); int hid_check_keys_pressed(struct hid_device *hid); int hid_connect(struct hid_device *hid, unsigned int connect_mask); void hid_disconnect(struct hid_device *hid); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] intel_pstate: Remove C0 tracking
Hi, On 12/05/2014 05:14 πμ, Stratos Karafotis wrote: > From: Dirk Brandewie > > Commit fcb6a15c intel_pstate: Take core C0 time into account for core busy > introduced a regression referenced below. The issue with "lockup" > after suspend that this commit was addressing is now dealt with in the > suspend path. > > References: >https://bugzilla.kernel.org/show_bug.cgi?id=66581 >https://bugzilla.kernel.org/show_bug.cgi?id=75121 > > Reported-by: Doug Smythies > Signed-off-by: Dirk Brandewie > --- > drivers/cpufreq/intel_pstate.c | 13 + > 1 file changed, 1 insertion(+), 12 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index bb20881..4c26faf 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -59,7 +59,6 @@ struct sample { > int32_t core_pct_busy; > u64 aperf; > u64 mperf; > - unsigned long long tsc; > int freq; > }; > > @@ -100,7 +99,6 @@ struct cpudata { > > u64 prev_aperf; > u64 prev_mperf; > - unsigned long long prev_tsc; > struct sample sample; > }; > > @@ -561,46 +559,37 @@ static inline void intel_pstate_calc_busy(struct > cpudata *cpu, > struct sample *sample) > { > int32_t core_pct; > - int32_t c0_pct; > > core_pct = div_fp(int_tofp((sample->aperf)), > int_tofp((sample->mperf))); > core_pct = mul_fp(core_pct, int_tofp(100)); > FP_ROUNDUP(core_pct); > > - c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc)); > - > sample->freq = fp_toint( > mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); > > - sample->core_pct_busy = mul_fp(core_pct, c0_pct); > + sample->core_pct_busy = core_pct; > } > > static inline void intel_pstate_sample(struct cpudata *cpu) > { > u64 aperf, mperf; > - unsigned long long tsc; > > rdmsrl(MSR_IA32_APERF, aperf); > rdmsrl(MSR_IA32_MPERF, mperf); > - tsc = native_read_tsc(); > > aperf = aperf >> FRAC_BITS; > mperf = mperf >> FRAC_BITS; > - tsc = tsc >> FRAC_BITS; > > cpu->sample.aperf = aperf; > cpu->sample.mperf = mperf; > - cpu->sample.tsc = tsc; > cpu->sample.aperf -= cpu->prev_aperf; > cpu->sample.mperf -= cpu->prev_mperf; > - cpu->sample.tsc -= cpu->prev_tsc; > > intel_pstate_calc_busy(cpu, >sample); > > cpu->prev_aperf = aperf; > cpu->prev_mperf = mperf; > - cpu->prev_tsc = tsc; > } > > static inline void intel_pstate_set_sample_time(struct cpudata *cpu) > -- > 1.9.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > With this patch, my CPU (Core i7-3770 @ 3.90GHz) seems to never use lowest frequencies. Even on an idle system I get always ~2GHz. Normally, on an idle system it used to be 1.6GHz. On very small loads (mp3 decoding) the CPU goes up to 2.7G GHz (it used to be 1.6GHz) Reverting, this patch on my local build, the problem is resolved. Thanks, Stratos Karafotis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 035/143] net: dst: provide accessor function to dst->xfrm
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Vlad Yasevich [ Upstream commit e87b3998d795123b4139bc3f25490dd236f68212 ] dst->xfrm is conditionally defined. Provide accessor funtion that is always available. Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- include/net/dst.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/net/dst.h b/include/net/dst.h index 5a900dd..49f443b 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -286,11 +286,22 @@ static inline int __xfrm_lookup(struct net *net, struct dst_entry **dst_p, { return 0; } +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst) +{ + return NULL; +} + #else extern int xfrm_lookup(struct net *net, struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); extern int __xfrm_lookup(struct net *net, struct dst_entry **dst_p, struct flowi *fl, struct sock *sk, int flags); + +/* skb attached with this dst needs transformation if dst->xfrm is valid */ +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst) +{ + return dst->xfrm; +} #endif #endif -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 020/143] HID: zeroplus: validate output report details
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 78214e81a1bf43740ce89bb5efda78eac2f8ef83 upstream The zeroplus HID driver was not checking the size of allocated values in fields it used. A HID device could send a malicious output report that would cause the driver to write beyond the output report allocation during initialization, causing a heap overflow: [ 1442.728680] usb 1-1: New USB device found, idVendor=0c12, idProduct=0005 ... [ 1466.243173] BUG kmalloc-192 (Tainted: GW ): Redzone overwritten CVE-2013-2889 Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina [jmm: backport to 2.6.32] Signed-off-by: Willy Tarreau --- drivers/hid/hid-zpff.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/hid/hid-zpff.c b/drivers/hid/hid-zpff.c index a79f0d7..5617ea9 100644 --- a/drivers/hid/hid-zpff.c +++ b/drivers/hid/hid-zpff.c @@ -68,21 +68,13 @@ static int zpff_init(struct hid_device *hid) struct hid_report *report; struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list); - struct list_head *report_list = - >report_enum[HID_OUTPUT_REPORT].report_list; struct input_dev *dev = hidinput->input; - int error; + int i, error; - if (list_empty(report_list)) { - dev_err(>dev, "no output report found\n"); - return -ENODEV; - } - - report = list_entry(report_list->next, struct hid_report, list); - - if (report->maxfield < 4) { - dev_err(>dev, "not enough fields in report\n"); - return -ENODEV; + for (i = 0; i < 4; i++) { + report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, i, 1); + if (!report) + return -ENODEV; } zpff = kzalloc(sizeof(struct zpff_device), GFP_KERNEL); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 015/143] af_key: fix info leaks in notify messages
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Mathias Krause commit a5cc68f3d63306d0d288f31edfc2ae6ef8ecd887 upstream key_notify_sa_flush() and key_notify_policy_flush() miss to initialize the sadb_msg_reserved member of the broadcasted message and thereby leak 2 bytes of heap memory to listeners. Fix that. Signed-off-by: Mathias Krause Cc: Steffen Klassert Cc: "David S. Miller" Cc: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/key/af_key.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/key/af_key.c b/net/key/af_key.c index 4e98193..03d626f 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1726,6 +1726,7 @@ static int key_notify_sa_flush(struct km_event *c) hdr->sadb_msg_version = PF_KEY_V2; hdr->sadb_msg_errno = (uint8_t) 0; hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + hdr->sadb_msg_reserved = 0; pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net); @@ -2694,6 +2695,7 @@ static int key_notify_policy_flush(struct km_event *c) hdr->sadb_msg_version = PF_KEY_V2; hdr->sadb_msg_errno = (uint8_t) 0; hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t)); + hdr->sadb_msg_reserved = 0; pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net); return 0; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 039/143] davinci_emac.c: Fix IFF_ALLMULTI setup
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Mariusz Ceier [ Upstream commit d69e0f7ea95fef8059251325a79c004bac01f018 ] When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't, emac_dev_mcast_set should only enable RX of multicasts and reset MACHASH registers. It does this, but afterwards it either sets up multicast MACs filtering or disables RX of multicasts and resets MACHASH registers again, rendering IFF_ALLMULTI flag useless. This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set. Tested with kernel 2.6.37. Signed-off-by: Mariusz Ceier Acked-by: Mugunthan V N Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/davinci_emac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c index e347831..eafd1e4 100644 --- a/drivers/net/davinci_emac.c +++ b/drivers/net/davinci_emac.c @@ -960,7 +960,7 @@ static void emac_dev_mcast_set(struct net_device *ndev) mbp_enable = (mbp_enable | EMAC_MBP_RXMCAST); emac_add_mcast(priv, EMAC_ALL_MULTI_SET, NULL); } - if (ndev->mc_count > 0) { + else if (ndev->mc_count > 0) { struct dev_mc_list *mc_ptr; mbp_enable = (mbp_enable | EMAC_MBP_RXMCAST); emac_add_mcast(priv, EMAC_ALL_MULTI_CLR, NULL); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 025/143] crypto: api - Fix race condition in larval lookup
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Nikola Pajkovsky https://bugzilla.redhat.com/1016108 64z is missing rhel6 commit 3af031a395c0 ("[crypto] algboss: Hold ref count on larval") which is causing cosmetic fuzz, because crypto_alg_get was move from crypto/api.c to crypto/internal.h. From: Herbert Xu [ upstream commit 77dbd7a95e4a4f15264c333a9e9ab97ee27dc2aa ] crypto_larval_lookup should only return a larval if it created one. Any larval created by another entity must be processed through crypto_larval_wait before being returned. Otherwise this will lead to a larval being killed twice, which will most likely lead to a crash. Cc: sta...@vger.kernel.org Reported-by: Kees Cook Tested-by: Kees Cook Signed-off-by: Herbert Xu Signed-off-by: Nikola Pajkovsky Signed-off-by: Willy Tarreau --- crypto/api.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/crypto/api.c b/crypto/api.c index 798526d..f4be65f 100644 --- a/crypto/api.c +++ b/crypto/api.c @@ -40,6 +40,8 @@ static inline struct crypto_alg *crypto_alg_get(struct crypto_alg *alg) return alg; } +static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg); + struct crypto_alg *crypto_mod_get(struct crypto_alg *alg) { return try_module_get(alg->cra_module) ? crypto_alg_get(alg) : NULL; @@ -150,8 +152,11 @@ static struct crypto_alg *crypto_larval_add(const char *name, u32 type, } up_write(_alg_sem); - if (alg != >alg) + if (alg != >alg) { kfree(larval); + if (crypto_is_larval(alg)) + alg = crypto_larval_wait(alg); + } return alg; } -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 04/12 v2] CPU ConCurrency tracking
CC can only be modified when enqueue and dequeue the CPU rq. And we also track it in scheduler tick and idle enter/exit in case we may not have enqueue and dequeue for a long time. Therefore, we track CC in and only in these four points: 1. dequeue 2. enqueue 3. scheduler tick 4. idle enter and exit TODO: use existing load tracking framework Signed-off-by: Yuyang Du --- kernel/sched/core.c |3 +++ kernel/sched/fair.c |2 ++ 2 files changed, 5 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7958a47..0236455 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -767,6 +767,7 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int flags) update_rq_clock(rq); sched_info_queued(rq, p); p->sched_class->enqueue_task(rq, p, flags); + update_cpu_concurrency(rq); } static void dequeue_task(struct rq *rq, struct task_struct *p, int flags) @@ -774,6 +775,7 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int flags) update_rq_clock(rq); sched_info_dequeued(rq, p); p->sched_class->dequeue_task(rq, p, flags); + update_cpu_concurrency(rq); } void activate_task(struct rq *rq, struct task_struct *p, int flags) @@ -2428,6 +2430,7 @@ void scheduler_tick(void) update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); update_cpu_load_active(rq); + update_cpu_concurrency(rq); raw_spin_unlock(>lock); perf_event_task_tick(); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7570dd9..e7153ff 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2563,6 +2563,7 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, void idle_enter_fair(struct rq *this_rq) { update_rq_runnable_avg(this_rq, 1); + update_cpu_concurrency(this_rq); } /* @@ -2573,6 +2574,7 @@ void idle_enter_fair(struct rq *this_rq) void idle_exit_fair(struct rq *this_rq) { update_rq_runnable_avg(this_rq, 0); + update_cpu_concurrency(this_rq); } static int idle_balance(struct rq *this_rq); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 01/12 v2] CONFIG for CPU ConCurrency
Add CONFIG_CPU_CONCURRENCY in arch/x86/Kconfig. This CONFIG enables/disables CPU ConCurrency load metric tracking. Signed-off-by: Yuyang Du --- arch/x86/Kconfig | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 25d2c6f..9bfac8d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -797,6 +797,17 @@ config SCHED_MC making when dealing with multi-core CPU chips at a cost of slightly increased overhead in some places. If unsure say N here. +config CPU_CONCURRENCY + bool "CPU ConCurency (CC)" + default n + depends on SMP + ---help--- + CPU ConCurrency (CC) is a new CPU load metric that measures the CPU + load by averaging the number of running tasks. Using CC, the scheduler + can evaluate the load of CPUs and may consolidate workloads on CPUs in + load balancing for power efficiency without sacrificing performance. + If unsure say N here. + source "kernel/Kconfig.preempt" config X86_UP_APIC -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 07/12 v2] CPU ConCurrency API for Workload Consolidation
Currently, CC is per CPU. To consolidate, the formula is based on a heuristic. Suppose we have 2 CPUs, their task concurrency over time is ('-' means no task, 'x' having tasks): 1) CPU0: ----- (CC[0]) CPU1: - (CC[1]) 2) CPU0: ----- (CC[0]) CPU1: ----- (CC[1]) If we consolidate CPU0 and CPU1, the consolidated CC will be: CC' = CC[0] + CC[1] for case 1 and CC'' = (CC[0] + CC[1]) * 2 for case 2. For the cases in between case 1 and 2 in terms of how xxx overlaps, the CC should be between CC' and CC''. So, we uniformly use this condition for consolidation (suppose we consolidate m CPUs to n CPUs, m > n): (CC[0] + CC[1] + ... + CC[m-2] + CC[m-1]) * (n + log(m-n)) >= --- kernel/sched/concurrency.c | 562 kernel/sched/sched.h | 13 + 2 files changed, 575 insertions(+) diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c index da26dd7..21e5631 100644 --- a/kernel/sched/concurrency.c +++ b/kernel/sched/concurrency.c @@ -28,6 +28,25 @@ unsigned int sysctl_concurrency_decay_rate = 1UL; */ static unsigned int cc_contrib_period = 10UL; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +/* + * whether we use concurrency to select cpu to run + * the woken up task + */ +static unsigned int wc_wakeup = 1UL; + +/* + * concurrency lower than percentage of this number + * is capable of running wakee + */ +static unsigned int wc_wakeup_threshold = 80UL; + +/* + * aggressively push the task even it is hot + */ +static unsigned int wc_push_hot_task = 1UL; +#endif + /* * the concurrency is scaled up for decaying, * thus, concurrency 1 is effectively 2^cc_resolution (1024), @@ -343,6 +362,9 @@ void init_cpu_concurrency(struct rq *rq) rq->concurrency.nr_running = 0; rq->concurrency.sum_timestamp = ULLONG_MAX; rq->concurrency.contrib_timestamp = ULLONG_MAX; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + rq->concurrency.unload = 0; +#endif } /* @@ -364,3 +386,543 @@ void update_cpu_concurrency(struct rq *rq) } #endif + +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +/* + * whether cpu is capable of having more concurrency + */ +static int cpu_cc_capable(int cpu) +{ + u64 sum = cpu_rq(cpu)->concurrency.sum_now; + u64 threshold = cc_weight(1); + + sum *= 100; + sum *= cpu_rq(cpu)->cpu_power; + + threshold *= wc_wakeup_threshold; + threshold <<= SCHED_POWER_SHIFT; + + if (sum <= threshold) + return 1; + + return 0; +} + +/* + * we do not select idle, if the cc of the + * wakee and waker (in this order) is capable + * of handling the wakee task + */ +int workload_consolidation_wakeup(int prev, int target) +{ + if (!wc_wakeup) { + if (idle_cpu(target)) + return target; + + return nr_cpu_ids; + } + + if (idle_cpu(prev) || cpu_cc_capable(prev)) + return prev; + + if (prev != target && (idle_cpu(target) || cpu_cc_capable(target))) + return target; + + return nr_cpu_ids; +} + +static inline u64 sched_group_cc(struct sched_group *sg) +{ + u64 sg_cc = 0; + int i; + + for_each_cpu(i, sched_group_cpus(sg)) + sg_cc += cpu_rq(i)->concurrency.sum_now * + cpu_rq(i)->cpu_power; + + return sg_cc; +} + +static inline u64 sched_domain_cc(struct sched_domain *sd) +{ + struct sched_group *sg = sd->groups; + u64 sd_cc = 0; + + do { + sd_cc += sched_group_cc(sg); + sg = sg->next; + } while (sg != sd->groups); + + return sd_cc; +} + +static inline struct sched_group * +find_lowest_cc_group(struct sched_group *sg, int span) +{ + u64 grp_cc, min = ULLONG_MAX; + struct sched_group *lowest = NULL; + int i; + + for (i = 0; i < span; ++i) { + grp_cc = sched_group_cc(sg); + + if (grp_cc < min) { + min = grp_cc; + lowest = sg; + } + + sg = sg->next; + } + + return lowest; +} + +static inline u64 __calc_cc_thr(int cpus, unsigned int asym_cc) +{ + u64 thr = cpus; + + thr *= cc_weight(1); + thr *= asym_cc; + thr <<= SCHED_POWER_SHIFT; + + return thr; +} + +/* + * can @src_cc of @src_nr cpus be consolidated + * to @dst_cc of @dst_nr cpus + */ +static inline int +__can_consolidate_cc(u64 src_cc, int src_nr, u64 dst_cc, int dst_nr) +{ + dst_cc *= dst_nr; + src_nr -= dst_nr; + + if (unlikely(src_nr <= 0)) + return 0; + + src_nr = ilog2(src_nr); + src_nr += dst_nr; + src_cc *= src_nr; + + if (src_cc > dst_cc) + return 0; + + return 1; +} + +/* + * find the group for asymmetric concurrency + * problem to address: traverse sd from top to down + */ +struct sched_group * +workload_consolidation_find_group(struct
[RFC PATCH 10/12 v2] Intercept periodic nohz idle balancing
We intercept load balancing to contain the load and load balancing in the consolidated CPUs according to our consolidating mechanism. In periodic nohz idle balance, we skip the idle but non-consolidated CPUs from load balancing. Signed-off-by: Yuyang Du --- kernel/sched/fair.c | 50 +++--- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 94c7a6a..9bb1304 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6867,10 +6867,46 @@ static struct { static inline int find_new_ilb(void) { +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask); + int ilb, weight; + int this_cpu = smp_processor_id(); + + /* +* Optimize for the case when we have no idle CPUs or only one +* idle CPU. Don't walk the sched_domain hierarchy in such cases +*/ + if (cpumask_weight(nohz.idle_cpus_mask) < 2) + return nr_cpu_ids; + + ilb = cpumask_first(nohz.idle_cpus_mask); + + if (ilb < nr_cpu_ids && idle_cpu(ilb)) { + + cpumask_copy(nonshielded, nohz.idle_cpus_mask); + + rcu_read_lock(); + workload_consolidation_nonshielded_mask(this_cpu, nonshielded); + rcu_read_unlock(); + + weight = cpumask_weight(nonshielded); + + if (weight < 2) + return nr_cpu_ids; + + /* +* get idle load balancer again +*/ + ilb = cpumask_first(nonshielded); + if (ilb < nr_cpu_ids && idle_cpu(ilb)) + return ilb; + } +#else int ilb = cpumask_first(nohz.idle_cpus_mask); if (ilb < nr_cpu_ids && idle_cpu(ilb)) return ilb; +#endif return nr_cpu_ids; } @@ -7107,7 +7143,7 @@ out: * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the * rebalancing for all the cpus for whom scheduler ticks are stopped. */ -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) +static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle, struct cpumask *mask) { int this_cpu = this_rq->cpu; struct rq *rq; @@ -7117,7 +7153,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) goto end; - for_each_cpu(balance_cpu, nohz.idle_cpus_mask) { + for_each_cpu(balance_cpu, mask) { if (balance_cpu == this_cpu || !idle_cpu(balance_cpu)) continue; @@ -7165,10 +7201,10 @@ static inline int nohz_kick_needed(struct rq *rq) if (unlikely(rq->idle_balance)) return 0; - /* - * We may be recently in ticked or tickless idle mode. At the first - * busy tick after returning from idle, we will update the busy stats. - */ + /* +* We may be recently in ticked or tickless idle mode. At the first +* busy tick after returning from idle, we will update the busy stats. +*/ set_cpu_sd_state_busy(); nohz_balance_exit_idle(cpu); @@ -7211,7 +7247,7 @@ need_kick: return 1; } #else -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { } +static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle, struct cpumask *mask) { } #endif /* -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 08/12 v2] Intercept wakeup/fork/exec load balancing
We intercept load balancing to contain the load and load balancing in the consolidated CPUs according to our consolidating mechanism. In wakeup load balaning, we do not select idle, if the CC of the wakee and waker (in this order if SD_WAKE_AFFINE) is capable of handling the wakee task. And in fork/exec load balancing when finding the sched_group, we find the consolidated group. Signed-off-by: Yuyang Du --- kernel/sched/fair.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e7153ff..c7a6347 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4365,9 +4365,16 @@ static int select_idle_sibling(struct task_struct *p, int target) struct sched_domain *sd; struct sched_group *sg; int i = task_cpu(p); +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + int ret; + ret = workload_consolidation_wakeup(i, target); + if (ret < nr_cpu_ids) + return ret; +#else if (idle_cpu(target)) return target; +#endif /* * If the prevous cpu is cache affine and idle, don't be stupid. @@ -4460,7 +4467,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f } while (sd) { - struct sched_group *group; + struct sched_group *group = NULL; int weight; if (!(sd->flags & sd_flag)) { @@ -4468,6 +4475,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f continue; } +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + if (sd->flags & SD_WORKLOAD_CONSOLIDATION) + group = workload_consolidation_find_group(sd, p, cpu); + + if (!group) +#endif group = find_idlest_group(sd, p, cpu, sd_flag); if (!group) { sd = sd->child; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 09/12 v2] Intercept idle balancing
We intercept load balancing to contain the load and load balancing in the consolidated CPUs according to our consolidating mechanism. In idle balancing, we do two things: 1) Skip pulling task to the idle non-consolidated CPUs. 2) In addition, for consolidated Idle CPU, we aggressively pull tasks from non-consolidated CPUs. Signed-off-by: Yuyang Du --- kernel/sched/fair.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c7a6347..94c7a6a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6655,6 +6655,10 @@ out: return ld_moved; } +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask); +#endif + /* * idle_balance is called by schedule() if this_cpu is about to become * idle. Attempts to pull tasks from other CPUs. @@ -,6 +6670,9 @@ static int idle_balance(struct rq *this_rq) unsigned long next_balance = jiffies + HZ; u64 curr_cost = 0; int this_cpu = this_rq->cpu; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask); +#endif idle_enter_fair(this_rq); /* @@ -6684,6 +6691,19 @@ static int idle_balance(struct rq *this_rq) update_blocked_averages(this_cpu); rcu_read_lock(); + +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + cpumask_copy(nonshielded, cpu_active_mask); + + /* +* if we encounter shielded cpus here, don't do balance on them +*/ + workload_consolidation_nonshielded_mask(this_cpu, nonshielded); + if (!cpumask_test_cpu(this_cpu, nonshielded)) + goto unlock; + workload_consolidation_unload(nonshielded); +#endif + for_each_domain(this_cpu, sd) { unsigned long interval; int continue_balancing = 1; @@ -6716,6 +6736,9 @@ static int idle_balance(struct rq *this_rq) if (pulled_task) break; } +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +unlock: +#endif rcu_read_unlock(); raw_spin_lock(_rq->lock); @@ -7709,6 +7732,14 @@ void print_cfs_stats(struct seq_file *m, int cpu) __init void init_sched_fair_class(void) { #ifdef CONFIG_SMP +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + unsigned int i; + for_each_possible_cpu(i) { + zalloc_cpumask_var_node(_cpu(local_cpu_mask, i), + GFP_KERNEL, cpu_to_node(i)); + } +#endif + open_softirq(SCHED_SOFTIRQ, run_rebalance_domains); #ifdef CONFIG_NO_HZ_COMMON -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 05/12 v2] CONFIG for Workload Consolidation
Add CONFIG_WORKLOAD_CONSOLIDATION in arch/x86/Kconfig. This CONFIG enables and disables CPU workload consolidation in scheduler's load balancing. Signed-off-by: Yuyang Du --- arch/x86/Kconfig | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9bfac8d..0999c16 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -808,6 +808,16 @@ config CPU_CONCURRENCY load balancing for power efficiency without sacrificing performance. If unsure say N here. +config WORKLOAD_CONSOLIDATION + bool "CPU Workload Consolidation" + default n + depends on CPU_CONCURRENCY + ---help--- + CPU Workload Consolidation is a new CPU PM module, which uses the CPU + concurrency of the CPU, and allows asymmetric concurrency across CPUs to + reduce the SW and HW overhead to increase load balance efficiency and + conserve energy. If unsure say N here. + source "kernel/Kconfig.preempt" config X86_UP_APIC -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 12/12 v2] Intercept RT scheduler
We intercept load balancing to contain the load and load balancing in the consolidated CPUs according to our consolidating mechanism. In RT scheduler, we also skip pulling/selecting task to the idle non-consolidated CPUs. This is pretty provocative. Signed-off-by: Yuyang Du --- kernel/sched/rt.c | 25 + 1 file changed, 25 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index bd2267a..f8141fb 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1217,6 +1217,9 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags) { struct task_struct *curr; struct rq *rq; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + int do_find = 0; +#endif if (p->nr_cpus_allowed == 1) goto out; @@ -1230,6 +1233,11 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags) rcu_read_lock(); curr = ACCESS_ONCE(rq->curr); /* unlocked access */ +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + if (workload_consolidation_cpu_shielded(cpu)) + do_find = 1; +#endif + /* * If the current task on @p's runqueue is an RT task, then * try to see if we can wake this RT task up on another @@ -1252,9 +1260,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags) * This test is optimistic, if we get it wrong the load-balancer * will have to sort it out. */ +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + if (do_find || (curr && unlikely(rt_task(curr)) && + (curr->nr_cpus_allowed < 2 || +curr->prio <= p->prio))) { +#else if (curr && unlikely(rt_task(curr)) && (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio)) { +#endif int target = find_lowest_rq(p); if (target != -1) @@ -1460,6 +1474,12 @@ static int find_lowest_rq(struct task_struct *task) if (!cpupri_find(_rq(task)->rd->cpupri, task, lowest_mask)) return -1; /* No targets found */ +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + workload_consolidation_nonshielded_mask(this_cpu, lowest_mask); + if (!cpumask_weight(lowest_mask)) + return -1; +#endif + /* * At this point we have built a mask of cpus representing the * lowest priority tasks in the system. Now we want to elect @@ -1687,6 +1707,11 @@ static int pull_rt_task(struct rq *this_rq) if (likely(!rt_overloaded(this_rq))) return 0; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + if (workload_consolidation_cpu_shielded(this_cpu)) + return 0; +#endif + /* * Match the barrier from rt_set_overloaded; this guarantees that if we * see overloaded we must also see the rto_mask bit. -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 06/12 v2] Attach CPU topology to specify each sched_domain's workload consolidation
Defined SD_WORKLOAD_CONSOLIDATION flag in sched_domain. When this flag is set, the workload consolidation applies to that domain. In addition, a consolidating_coeff is defined in sched_domain to specify the degree of consolidation in that domain. Signed-off-by: Yuyang Du --- include/linux/sched.h| 13 + include/linux/topology.h | 16 kernel/sched/core.c | 41 + 3 files changed, 70 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 25f54c7..f3f7d4a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -877,6 +877,12 @@ enum cpu_idle_type { #define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */ #define SD_NUMA0x4000 /* cross-node balancing */ +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +#define SD_WORKLOAD_CONSOLIDATION 0x8000 /* Higher concurrency in front */ +#else +#define SD_WORKLOAD_CONSOLIDATION 0 +#endif + extern int __weak arch_sd_sibiling_asym_packing(void); struct sched_domain_attr { @@ -960,6 +966,13 @@ struct sched_domain { struct rcu_head rcu;/* used during destruction */ }; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + unsigned int total_groups; /* total groups number */ + unsigned int group_number; /* this CPU's group sequence */ + unsigned int consolidating_coeff; /* consolidating coefficient */ + struct sched_group *first_group;/* ordered by CPU number */ +#endif + unsigned int span_weight; /* * Span of all CPUs in this domain. diff --git a/include/linux/topology.h b/include/linux/topology.h index 7062330..334f83e 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -66,6 +66,16 @@ int arch_update_cpu_topology(void); #define PENALTY_FOR_NODE_WITH_CPUS (1) #endif +#ifdef CONFIG_WORKLOAD_CONSOLIDATION +#ifndef WORKLOAD_CONSOLIDATION_INIT +#define WORKLOAD_CONSOLIDATION_INIT(n) .consolidating_coeff = (n), +#endif +#else +#ifndef WORKLOAD_CONSOLIDATION_INIT +#define WORKLOAD_CONSOLIDATION_INIT(n) +#endif +#endif + /* * Below are the 3 major initializers used in building sched_domains: * SD_SIBLING_INIT, for SMT domains @@ -102,12 +112,14 @@ int arch_update_cpu_topology(void); | 0*SD_SERIALIZE\ | 0*SD_PREFER_SIBLING \ | arch_sd_sibling_asym_packing()\ + | 0*SD_WORKLOAD_CONSOLIDATION \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ .smt_gain = 1178, /* 15% */ \ .max_newidle_lb_cost= 0,\ .next_decay_max_lb_cost = jiffies, \ + WORKLOAD_CONSOLIDATION_INIT(0) \ } #endif #endif /* CONFIG_SCHED_SMT */ @@ -134,11 +146,13 @@ int arch_update_cpu_topology(void); | 0*SD_SHARE_CPUPOWER \ | 1*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ + | 1*SD_WORKLOAD_CONSOLIDATION \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ .max_newidle_lb_cost= 0,\ .next_decay_max_lb_cost = jiffies, \ + WORKLOAD_CONSOLIDATION_INIT(180)\ } #endif #endif /* CONFIG_SCHED_MC */ @@ -167,11 +181,13 @@ int arch_update_cpu_topology(void); | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ | 1*SD_PREFER_SIBLING \ + | 1*SD_WORKLOAD_CONSOLIDATION \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ .max_newidle_lb_cost= 0,\ .next_decay_max_lb_cost = jiffies, \ + WORKLOAD_CONSOLIDATION_INIT(180)\ } #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0236455..cd92f2d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4843,7 +4843,11 @@
[RFC PATCH 11/12 v2] Intercept periodic load balancing
We intercept load balancing to contain the load and load balancing in the consolidated CPUs according to our consolidating mechanism. In periodic load balancing, we do two things: 1) Skip pulling task to the non-consolidated CPUs. 2) In addition, for consolidated Idle CPU, we aggressively pull tasks from non-consolidated CPUs. Signed-off-by: Yuyang Du --- kernel/sched/fair.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9bb1304..1c7a3d7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7260,6 +7260,36 @@ static void run_rebalance_domains(struct softirq_action *h) enum cpu_idle_type idle = this_rq->idle_balance ? CPU_IDLE : CPU_NOT_IDLE; +#ifdef CONFIG_WORKLOAD_CONSOLIDATION + struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask); + int this_cpu = cpu_of(this_rq); + + /* +* if we encounter shielded cpus here, don't do balance on them +*/ + cpumask_copy(nonshielded, cpu_active_mask); + + rcu_read_lock(); + workload_consolidation_nonshielded_mask(this_cpu, nonshielded); + rcu_read_unlock(); + + /* +* aggressively unload the shielded cpus to unshielded cpus +*/ + workload_consolidation_unload(nonshielded); + + if (cpumask_test_cpu(this_cpu, nonshielded)) { + rebalance_domains(this_rq, idle); + + /* +* If this cpu has a pending nohz_balance_kick, then do the +* balancing on behalf of the other idle cpus whose ticks are +* stopped. +*/ + cpumask_and(nonshielded, nonshielded, nohz.idle_cpus_mask); + nohz_idle_balance(this_rq, idle, nonshielded); + } +#else rebalance_domains(this_rq, idle); /* @@ -7267,7 +7297,8 @@ static void run_rebalance_domains(struct softirq_action *h) * balancing on behalf of the other idle cpus whose ticks are * stopped. */ - nohz_idle_balance(this_rq, idle); + nohz_idle_balance(this_rq, idle, nohz.idle_cpus_mask); +#endif } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 02/12 v2] Init CPU ConCurrency
CPU ConCurrency (CC) is inserted as a member in each CPU's rq, and initiated the same time as rq. Updating CC is protected by rq's lock. Signed-off-by: Yuyang Du --- kernel/sched/Makefile |1 + kernel/sched/concurrency.c | 22 ++ kernel/sched/core.c|2 ++ kernel/sched/sched.h | 21 + 4 files changed, 46 insertions(+) create mode 100644 kernel/sched/concurrency.c diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index ab32b7b..e67f7e3 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -19,3 +19,4 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o obj-$(CONFIG_SCHEDSTATS) += stats.o obj-$(CONFIG_SCHED_DEBUG) += debug.o obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o +obj-$(CONFIG_CPU_CONCURRENCY) += concurrency.o diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c new file mode 100644 index 000..50e08a2 --- /dev/null +++ b/kernel/sched/concurrency.c @@ -0,0 +1,22 @@ +/* + * CPU ConCurrency (CC) measures the CPU load by averaging + * the number of running tasks. Using CC, the scheduler can + * evaluate the load of CPUs to improve load balance for power + * efficiency without sacrificing performance. + * + */ + +#ifdef CONFIG_CPU_CONCURRENCY + +#include "sched.h" + +void init_cpu_concurrency(struct rq *rq) +{ + rq->concurrency.sum = 0; + rq->concurrency.sum_now = 0; + rq->concurrency.contrib = 0; + rq->concurrency.nr_running = 0; + rq->concurrency.sum_timestamp = ULLONG_MAX; + rq->concurrency.contrib_timestamp = ULLONG_MAX; +} +#endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 268a45e..7958a47 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6884,6 +6884,8 @@ void __init sched_init(void) #endif init_rq_hrtick(rq); atomic_set(>nr_iowait, 0); + + init_cpu_concurrency(rq); } set_load_weight(_task); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 456e492..f1c9235 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -508,6 +508,17 @@ extern struct root_domain def_root_domain; #endif /* CONFIG_SMP */ +#ifdef CONFIG_CPU_CONCURRENCY +struct cpu_concurrency_t { + u64 sum; + u64 sum_now; + u64 contrib; + u64 sum_timestamp; + u64 contrib_timestamp; + unsigned int nr_running; +}; +#endif + /* * This is the main, per-CPU runqueue data structure. * @@ -643,6 +654,10 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif + +#ifdef CONFIG_CPU_CONCURRENCY + struct cpu_concurrency_t concurrency; +#endif }; static inline int cpu_of(struct rq *rq) @@ -1203,6 +1218,12 @@ extern void init_sched_dl_class(void); extern void resched_task(struct task_struct *p); extern void resched_cpu(int cpu); +#ifdef CONFIG_CPU_CONCURRENCY +extern void init_cpu_concurrency(struct rq *rq); +#else +static inline void init_cpu_concurrency(struct rq *rq) {} +#endif + extern struct rt_bandwidth def_rt_bandwidth; extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 03/12 v2] CPU ConCurrency calculation
It is natural to use task concurrency (running tasks in the rq) as load indicator. We calculate CC from task concurrency by two steps: 1) Divide continuous time into periods of time, and average task concurrency in period, for tolerating the transient bursts: a = sum(concurrency * time) / period 2) Exponentially decay past periods, and synthesize them all, for hysteresis to load drops or resilience to load rises (let f be decaying factor, and a_x the xth period average since period 0): s = a_n + f^1 * a_n-1 + f^2 * a_n-2 +, ..., + f^(n-1) * a_1 + f^n * a_0 Signed-off-by: Yuyang Du --- include/linux/sched/sysctl.h |8 + kernel/sched/concurrency.c | 344 ++ kernel/sched/sched.h |2 + kernel/sysctl.c | 16 ++ 4 files changed, 370 insertions(+) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 8045a55..ec52b3f 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -36,6 +36,14 @@ extern unsigned int sysctl_sched_min_granularity; extern unsigned int sysctl_sched_wakeup_granularity; extern unsigned int sysctl_sched_child_runs_first; +#ifdef CONFIG_CPU_CONCURRENCY +extern unsigned int sysctl_concurrency_sum_period; +extern unsigned int sysctl_concurrency_decay_rate; +extern int concurrency_decay_rate_handler(struct ctl_table *table, int write, + void __user *buffer, + size_t *lenp, loff_t *ppos); +#endif + enum sched_tunable_scaling { SCHED_TUNABLESCALING_NONE, SCHED_TUNABLESCALING_LOG, diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c index 50e08a2..da26dd7 100644 --- a/kernel/sched/concurrency.c +++ b/kernel/sched/concurrency.c @@ -10,6 +10,331 @@ #include "sched.h" +/* + * the sum period of time is 2^26 ns (~64) by default + */ +unsigned int sysctl_concurrency_sum_period = 26UL; + +/* + * the number of sum periods, after which the original + * will be reduced/decayed to half + */ +unsigned int sysctl_concurrency_decay_rate = 1UL; + +/* + * the contrib period of time is 2^10 (~1us) by default, + * us has better precision than ms, and + * 1024 makes use of faster shift than div + */ +static unsigned int cc_contrib_period = 10UL; + +/* + * the concurrency is scaled up for decaying, + * thus, concurrency 1 is effectively 2^cc_resolution (1024), + * which can be halved by 10 half-life periods + */ +static unsigned int cc_resolution = 10UL; + +/* + * after this number of half-life periods, even + * (1>>32)-1 (which is sufficiently large) is less than 1 + */ +static unsigned int cc_decay_max_pds = 32UL; + +static inline u32 cc_scale_up(unsigned int c) +{ + return c << cc_resolution; +} + +static inline u32 cc_scale_down(unsigned int c) +{ + return c >> cc_resolution; +} + +/* from nanoseconds to sum periods */ +static inline u64 cc_sum_pds(u64 n) +{ + return n >> sysctl_concurrency_sum_period; +} + +/* from sum period to timestamp in ns */ +static inline u64 cc_timestamp(u64 p) +{ + return p << sysctl_concurrency_sum_period; +} + +/* + * from nanoseconds to contrib periods, because + * ns so risky that can overflow cc->contrib + */ +static inline u64 cc_contrib_pds(u64 n) +{ + return n >> cc_contrib_period; +} + +/* + * cc_decay_factor only works for 32bit integer, + * cc_decay_factor_x, x indicates the number of periods + * as half-life (sysctl_concurrency_decay_rate) + */ +static const u32 cc_decay_factor_1[] = { + 0x, +}; + +static const u32 cc_decay_factor_2[] = { + 0x, 0xB504F333, +}; + +static const u32 cc_decay_factor_4[] = { + 0x, 0xD744FCCA, 0xB504F333, 0x9837F051, +}; + +static const u32 cc_decay_factor_8[] = { + 0x, 0xEAC0C6E7, 0xD744FCCA, 0xC5672A11, + 0xB504F333, 0xA5FED6A9, 0x9837F051, 0x8B95C1E3, +}; + +/* by default sysctl_concurrency_decay_rate */ +static const u32 *cc_decay_factor = + cc_decay_factor_1; + +/* + * cc_decayed_sum depends on cc_resolution (fixed 10), + * cc_decayed_sum_x, x indicates the number of periods + * as half-life (sysctl_concurrency_decay_rate) + */ +static const u32 cc_decayed_sum_1[] = { + 0, 512, 768, 896, 960, 992, + 1008, 1016, 1020, 1022, 1023, +}; + +static const u32 cc_decayed_sum_2[] = { + 0, 724, 1235, 1597, 1853, 2034, 2162, 2252, + 2316, 2361, 2393, 2416, 2432, 2443, 2451, + 2457, 2461, 2464, 2466, 2467, 2468, 2469, +}; + +static const u32 cc_decayed_sum_4[] = { + 0, 861, 1585, 2193, 2705, 3135, 3497, 3801, 4057, + 4272, 4453, 4605, 4733, 4840, 4930, 5006, 5070, + 5124, 5169, 5207, 5239, 5266, 5289, 5308, 5324, + 5337, 5348, 5358, 5366, 5373, 5379, 5384, 5388, + 5391, 5394, 5396, 5398, 5400, 5401, 5402, 5403, + 5404, 5405, 5406, +}; + +static const u32 cc_decayed_sum_8[] = { + 0, 939, 1800, 2589, 3313, 3977, 4585, 5143, + 5655, 6124, 6554,
[RFC PATCH 00/12 v2] A new CPU load metric for power-efficient scheduler: CPU ConCurrency
Hi Ingo, PeterZ, Rafael, and others, The current scheduler’s load balancing is completely work-conserving. In some workload, generally low CPU utilization but immersed with CPU bursts of transient tasks, migrating task to engage all available CPUs for work-conserving can lead to significant overhead: cache locality loss, idle/active HW state transitional latency and power, shallower idle state, etc, which are both power and performance inefficient especially for today’s low power processors in mobile. This RFC introduces a sense of idleness-conserving into work-conserving (by all means, we really don’t want to be overwhelming in only one way). But to what extent the idleness-conserving should be, bearing in mind that we don’t want to sacrifice performance? We first need a load/idleness indicator to that end. Thanks to CFS’s “model an ideal, precise multi-tasking CPU”, tasks can be seen as concurrently running (the tasks in the runqueue). So it is natural to use task concurrency as load indicator. Having said that, we do two things: 1) Divide continuous time into periods of time, and average task concurrency in period, for tolerating the transient bursts: a = sum(concurrency * time) / period 2) Exponentially decay past periods, and synthesize them all, for hysteresis to load drops or resilience to load rises (let f be decaying factor, and a_x the xth period average since period 0): s = a_n + f^1 * a_n-1 + f^2 * a_n-2 +, ..., + f^(n-1) * a_1 + f^n * a_0 We name this load indicator as CPU ConCurrency (CC): task concurrency determines how many CPUs are needed to be running concurrently. Another two ways of how to interpret CC: 1) the current work-conserving load balance also uses CC, but instantaneous CC. 2) CC vs. CPU utilization. CC is runqueue-length-weighted CPU utilization. If we change: "a = sum(concurrency * time) / period" to "a' = sum(1 * time) / period". Then a' is just about the CPU utilization. And the way we weight runqueue-length is the simplest one (excluding the exponential decays, and you may have other ways). To track CC, we intercept the scheduler in 1) enqueue, 2) dequeue, 3) scheduler tick, and 4) enter/exit idle. After CC, in the consolidation part, we do 1) attach the CPU topology to be adaptive beyond our experimental platforms, and 2) intercept the current load balance for load and load balancing containment. Currently, CC is per CPU. To consolidate, the formula is based on a heuristic. Suppose we have 2 CPUs, their task concurrency over time is ('-' means no task, 'x' having tasks): 1) CPU0: ----- (CC[0]) CPU1: - (CC[1]) 2) CPU0: ----- (CC[0]) CPU1: ----- (CC[1]) If we consolidate CPU0 and CPU1, the consolidated CC will be: CC' = CC[0] + CC[1] for case 1 and CC'' = (CC[0] + CC[1]) * 2 for case 2. For the cases in between case 1 and 2 in terms of how xxx overlaps, the CC should be between CC' and CC''. So, we uniformly use this condition for consolidation (suppose we consolidate m CPUs to n CPUs, m > n): (CC[0] + CC[1] + ... + CC[m-2] + CC[m-1]) * (n + log(m-n)) >=http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 013/143] sctp: Use correct sideffect command in duplicate cookie handling
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Vlad Yasevich commit f2815633504b442ca0b0605c16bf3d88a3a0fcea upstream When SCTP is done processing a duplicate cookie chunk, it tries to delete a newly created association. For that, it has to set the right association for the side-effect processing to work. However, when it uses the SCTP_CMD_NEW_ASOC command, that performs more work then really needed (like hashing the associationa and assigning it an id) and there is no point to do that only to delete the association as a next step. In fact, it also creates an impossible condition where an association may be found by the getsockopt() call, and that association is empty. This causes a crash in some sctp getsockopts. The solution is rather simple. We simply use SCTP_CMD_SET_ASOC command that doesn't have all the overhead and does exactly what we need. Reported-by: Karl Heiss Tested-by: Karl Heiss CC: Neil Horman Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/sm_statefuns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 9e4e846..486df56 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -2045,7 +2045,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const struct sctp_endpoint *ep, } /* Delete the tempory new association. */ - sctp_add_cmd_sf(commands, SCTP_CMD_NEW_ASOC, SCTP_ASOC(new_asoc)); + sctp_add_cmd_sf(commands, SCTP_CMD_SET_ASOC, SCTP_ASOC(new_asoc)); sctp_add_cmd_sf(commands, SCTP_CMD_DELETE_TCB, SCTP_NULL()); /* Restore association pointer to provide SCTP command interpeter -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 036/143] sctp: Use software crc32 checksum when xfrm transform will happen.
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Fan Du [ Upstream commit 27127a82561a2a3ed955ce207048e1b066a80a2a ] igb/ixgbe have hardware sctp checksum support, when this feature is enabled and also IPsec is armed to protect sctp traffic, ugly things happened as xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing up and pack the 16bits result in the checksum field). The result is fail establishment of sctp communication. Cc: Neil Horman Cc: Steffen Klassert Signed-off-by: Fan Du Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sctp/output.c b/net/sctp/output.c index d494100..8d4eacf 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -506,7 +506,8 @@ int sctp_packet_transmit(struct sctp_packet *packet) * by CRC32-C as described in . */ if (!sctp_checksum_disable && - !(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM))) { + (!(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM)) || +(dst_xfrm(dst) != NULL))) { __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len); /* 3) Put the resultant value into the checksum field in the -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] [PATCH] drivers/hwmon/emc1403.c: add support for emc1412
On 05/11/2014 03:40 PM, Guenter Roeck wrote: [ ... ] id = i2c_smbus_read_byte_data(client, THERMAL_REVISION_REG); -if (id != 0x01) +if (id != 0x01 && id != 0x04) { return -ENODEV; This should be a separate patch, as it applies to emc1403/emc1404 as well, so we can backport it into -stable. Also, the chip datasheet suggests that chip revision 3 exists as well. Given that, I would suggest to replace the revision number check with something like if (id < 0x01 || id > 0x04) return -ENODEV; Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 031/143] can: dev: fix nlmsg size calculation in can_get_size()
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Marc Kleine-Budde [ Upstream commit fe119a05f8ca481623a8d02efcc984332e612528 ] This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Signed-off-by: Marc Kleine-Budde Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/can/dev.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index 2868fe8..ea2749f9 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -595,12 +595,12 @@ static size_t can_get_size(const struct net_device *dev) size_t size; size = nla_total_size(sizeof(u32)); /* IFLA_CAN_STATE */ - size += sizeof(struct can_ctrlmode); /* IFLA_CAN_CTRLMODE */ + size += nla_total_size(sizeof(struct can_ctrlmode)); /* IFLA_CAN_CTRLMODE */ size += nla_total_size(sizeof(u32)); /* IFLA_CAN_RESTART_MS */ - size += sizeof(struct can_bittiming); /* IFLA_CAN_BITTIMING */ - size += sizeof(struct can_clock); /* IFLA_CAN_CLOCK */ + size += nla_total_size(sizeof(struct can_bittiming)); /* IFLA_CAN_BITTIMING */ + size += nla_total_size(sizeof(struct can_clock)); /* IFLA_CAN_CLOCK */ if (priv->bittiming_const)/* IFLA_CAN_BITTIMING_CONST */ - size += sizeof(struct can_bittiming_const); + size += nla_total_size(sizeof(struct can_bittiming_const)); return size; } -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Michal Tesar [ Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da ] Limit the min/max value passed to the /proc/sys/net/ipv4/tcp_syn_retries. Signed-off-by: Michal Tesar Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/sysctl_net_ipv4.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 2dcf04d..910fa54 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -23,6 +23,8 @@ static int zero; static int tcp_retr1_max = 255; +static int tcp_syn_retries_min = 1; +static int tcp_syn_retries_max = MAX_TCP_SYNCNT; static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; @@ -237,7 +239,9 @@ static struct ctl_table ipv4_table[] = { .data = _config.no_pmtu_disc, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = _syn_retries_min, + .extra2 = _syn_retries_max }, { .ctl_name = NET_IPV4_NONLOCAL_BIND, -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 038/143] wanxl: fix info leak in ioctl
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: =?latin1?q?Salva=20Peir=F3?= [ Upstream commit 2b13d06c9584b4eb773f1e80bbaedab9a1c344e1 ] The wanxl_ioctl() code fails to initialize the two padding bytes of struct sync_serial_settings after the ->loopback member. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Salva Peiró Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/wan/wanxl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wan/wanxl.c b/drivers/net/wan/wanxl.c index daee8a0..b52b378 100644 --- a/drivers/net/wan/wanxl.c +++ b/drivers/net/wan/wanxl.c @@ -354,6 +354,7 @@ static int wanxl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) ifr->ifr_settings.size = size; /* data size wanted */ return -ENOBUFS; } + memset(, 0, sizeof(line)); line.clock_type = get_status(port)->clocking; line.clock_rate = 0; line.loopback = 0; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 040/143] resubmit bridge: fix message_age_timer calculation
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Chris Healy [ Upstream commit 9a0620133ccce9dd35c00a96405c8d80938c2cc0 ] This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/bridge/br_stp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c index c7d6bfc..a67e6ce 100644 --- a/net/bridge/br_stp.c +++ b/net/bridge/br_stp.c @@ -192,7 +192,7 @@ static inline void br_record_config_information(struct net_bridge_port *p, p->designated_age = jiffies + bpdu->message_age; mod_timer(>message_age_timer, jiffies - + (p->br->max_age - bpdu->message_age)); + + (bpdu->max_age - bpdu->message_age)); } /* called under bridge lock */ -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 047/143] ipv6: fix possible crashes in ip6_cork_release()
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet [ Upstream commit 284041ef21fdf2e0d216ab6b787bc9072b4eb58a ] commit 0178b695fd6b4 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Herbert Xu Cc: Hideaki YOSHIFUJI Cc: Neal Cardwell Signed-off-by: Willy Tarreau --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index bba91a1..bb63ffc 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1174,7 +1174,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, if (WARN_ON(np->cork.opt)) return -EINVAL; - np->cork.opt = kmalloc(opt->tot_len, sk->sk_allocation); + np->cork.opt = kzalloc(opt->tot_len, sk->sk_allocation); if (unlikely(np->cork.opt == NULL)) return -ENOBUFS; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 089/143] ipv6: fix possible seqlock deadlock in ip6_finish_output2
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Hannes Frederic Sowa [ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ] IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/ip6_output.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index bb63ffc..6ff4d07 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -91,8 +91,8 @@ static int ip6_output_finish(struct sk_buff *skb) else if (dst->neighbour) return dst->neighbour->output(skb); - IP6_INC_STATS_BH(dev_net(dst->dev), -ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); + IP6_INC_STATS(dev_net(dst->dev), + ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); kfree_skb(skb); return -EINVAL; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 002/143] Fix lockup related to stop_machine being stuck in __do_softirq.
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Ben Greear The stop machine logic can lock up if all but one of the migration threads make it through the disable-irq step and the one remaining thread gets stuck in __do_softirq. The reason __do_softirq can hang is that it has a bail-out based on jiffies timeout, but in the lockup case, jiffies itself is not incremented. To work around this, re-add the max_restart counter in __do_irq and stop processing irqs after 10 restarts. Thanks to Tejun Heo and Rusty Russell and others for helping me track this down. This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce latencies"). It may be worth looking into ath9k to see if it has issues with its irq handler at a later date. The hang stack traces look something like this: [ cut here ] WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7() Watchdog detected hard LOCKUP on cpu 2 Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc] Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11 Call Trace: warn_slowpath_common+0x85/0x9f warn_slowpath_fmt+0x46/0x48 watchdog_overflow_callback+0x9c/0xa7 __perf_event_overflow+0x137/0x1cb perf_event_overflow+0x14/0x16 intel_pmu_handle_irq+0x2dc/0x359 perf_event_nmi_handler+0x19/0x1b nmi_handle+0x7f/0xc2 do_nmi+0xbc/0x304 end_repeat_nmi+0x1e/0x2e <> cpu_stopper_thread+0xae/0x162 smpboot_thread_fn+0x258/0x260 kthread+0xc7/0xcf ret_from_fork+0x7c/0xb0 ---[ end trace 4947dfa9b0a4cec3 ]--- BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc] irq event stamp: 835637905 hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257 hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80 softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257 softirqs last disabled at (5654725): irq_exit+0x5f/0xbb CPU 1 Pid: 17, comm: migration/1 Tainted: GWC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. RIP: tasklet_hi_action+0xf0/0xf0 Process migration/1 Call Trace: __do_softirq+0x117/0x257 irq_exit+0x5f/0xbb smp_apic_timer_interrupt+0x8a/0x98 apic_timer_interrupt+0x72/0x80 printk+0x4d/0x4f stop_machine_cpu_stop+0x22c/0x274 cpu_stopper_thread+0xae/0x162 smpboot_thread_fn+0x258/0x260 kthread+0xc7/0xcf ret_from_fork+0x7c/0xb0 Signed-off-by: Ben Greear Acked-by: Tejun Heo Acked-by: Pekka Riikonen Cc: Eric Dumazet Cc: sta...@kernel.org Cc: Ben Hutchings Signed-off-by: Linus Torvalds (cherry picked from commit 34376a50fb1fa095b9d0636fa41ed2e73125f214) Signed-off-by: Willy Tarreau --- kernel/softirq.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index d75c136..e4d5d8c 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -194,8 +194,12 @@ void local_bh_enable_ip(unsigned long ip) EXPORT_SYMBOL(local_bh_enable_ip); /* - * We restart softirq processing for at most 2 ms, - * and if need_resched() is not set. + * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, + * but break the loop if need_resched() is set or after 2 ms. + * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in + * certain cases, such as stop_machine(), jiffies may cease to + * increment and so we need the MAX_SOFTIRQ_RESTART limit as + * well to make sure we eventually return from this method. * * These limits have been established via experimentation. * The two things to balance is latency against fairness - @@ -203,6 +207,7 @@ EXPORT_SYMBOL(local_bh_enable_ip); * should not be able to lock up the box. */ #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) +#define MAX_SOFTIRQ_RESTART 10 asmlinkage void __do_softirq(void) { @@ -210,6 +215,7 @@ asmlinkage void __do_softirq(void) __u32 pending; unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; + int max_restart = MAX_SOFTIRQ_RESTART; pending = local_softirq_pending(); account_system_vtime(current); @@ -254,7 +260,8 @@ restart: pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) && !need_resched()) + if (time_before(jiffies, end) && !need_resched() && + --max_restart) goto restart; wakeup_softirqd(); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list:
[ 113/143] aacraid: prevent invalid pointer dereference
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Mahesh Rajashekhara It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara Reported-by: Nico Golde Reported-by: Fabian Yamaguchi Signed-off-by: Linus Torvalds (cherry picked from commit b4789b8e6be3151a955ade74872822f30e8cd914) CVE-2013-6380 Signed-off-by: Willy Tarreau --- drivers/scsi/aacraid/commctrl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c index a5b8e7b..c895174 100644 --- a/drivers/scsi/aacraid/commctrl.c +++ b/drivers/scsi/aacraid/commctrl.c @@ -507,7 +507,8 @@ static int aac_send_raw_srb(struct aac_dev* dev, void __user * arg) goto cleanup; } - if (fibsize > (dev->max_fib_size - sizeof(struct aac_fibhdr))) { + if ((fibsize < (sizeof(struct user_aac_srb) - sizeof(struct user_sgentry))) || + (fibsize > (dev->max_fib_size - sizeof(struct aac_fibhdr { rcode = -EINVAL; goto cleanup; } -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 100/143] tg3: Fix deadlock in tg3_change_mtu()
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Nithin Sujir [ Upstream commit c6993dfd7db9b0c6b7ca7503a56fda9236a4710f ] Quoting David Vrabel - "5780 cards cannot have jumbo frames and TSO enabled together. When jumbo frames are enabled by setting the MTU, the TSO feature must be cleared. This is done indirectly by calling netdev_update_features() which will call tg3_fix_features() to actually clear the flags. netdev_update_features() will also trigger a new netlink message for the feature change event which will result in a call to tg3_get_stats64() which deadlocks on the tg3 lock." tg3_set_mtu() does not need to be under the tg3 lock since converting the flags to use set_bit(). Move it out to after tg3_netif_stop(). Reported-by: David Vrabel Tested-by: David Vrabel Signed-off-by: Michael Chan Signed-off-by: Nithin Nayak Sujir Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/tg3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 89aa69c..56648b4 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -5583,12 +5583,12 @@ static int tg3_change_mtu(struct net_device *dev, int new_mtu) tg3_netif_stop(tp); + tg3_set_mtu(dev, tp, new_mtu); + tg3_full_lock(tp, 1); tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); - tg3_set_mtu(dev, tp, new_mtu); - err = tg3_restart_hw(tp, 0); if (!err) -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 065/143] net: check net.core.somaxconn sysctl values
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Roman Gushchin [ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ] It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin Reported-by: Changli Gao Suggested-by: Eric Dumazet Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/core/sysctl_net_core.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 7db1de0..e2eaf29 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -14,6 +14,9 @@ #include #include +static int zero = 0; +static int ushort_max = 65535; + static struct ctl_table net_core_table[] = { #ifdef CONFIG_NET { @@ -116,7 +119,9 @@ static struct ctl_table netns_core_table[] = { .data = _net.core.sysctl_somaxconn, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .extra1 = , + .extra2 = _max, + .proc_handler = proc_dointvec_minmax }, { .ctl_name = 0 } }; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 037/143] sctp: Perform software checksum if packet has to be fragmented.
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Vlad Yasevich [ Upstream commit d2dbbba77e95dff4b4f901fee236fef6d9552072 ] IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum. This causes problems if SCTP packets has to be fragmented and ipsummed has been set to PARTIAL due to checksum offload support. This condition can happen when retransmitting after MTU discover, or when INIT or other control chunks are larger then MTU. Check for the rare fragmentation condition in SCTP and use software checksum calculation in this case. CC: Fan Du Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/output.c b/net/sctp/output.c index 8d4eacf..54bc011 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -507,7 +507,7 @@ int sctp_packet_transmit(struct sctp_packet *packet) */ if (!sctp_checksum_disable && (!(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM)) || -(dst_xfrm(dst) != NULL))) { +(dst_xfrm(dst) != NULL) || packet->ipfragok)) { __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len); /* 3) Put the resultant value into the checksum field in the -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 106/143] net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Daniel Borkmann [ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ] While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable"), we noticed that there's a skb memory leakage in the error path. Running the same reproducer as in ec0223ec48a9 and by unconditionally jumping to the error label (to simulate an error condition) in sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about the unfreed chunk->auth_chunk skb clone: Unreferenced object 0x8800b8f3a000 (size 256): comm "softirq", pid 0, jiffies 4294769856 (age 110.757s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X. backtrace: [] kmemleak_alloc+0x4e/0xb0 [] kmem_cache_alloc+0xc8/0x210 [] skb_clone+0x49/0xb0 [] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp] [] sctp_inq_push+0x4c/0x70 [sctp] [] sctp_rcv+0x82e/0x9a0 [sctp] [] ip_local_deliver_finish+0xa8/0x210 [] nf_reinject+0xbf/0x180 [] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue] [] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink] [] netlink_rcv_skb+0xa9/0xc0 [] nfnetlink_rcv+0x23f/0x408 [nfnetlink] [] netlink_unicast+0x168/0x250 [] netlink_sendmsg+0x2e1/0x3f0 [] sock_sendmsg+0x8b/0xc0 [] ___sys_sendmsg+0x369/0x380 What happens is that commit bbd0d59809f9 clones the skb containing the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case that an endpoint requires COOKIE-ECHO chunks to be authenticated: -- INIT[RANDOM; CHUNKS; HMAC-ALGO] --> <--- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] - -- AUTH; COOKIE-ECHO > < COOKIE-ACK - When we enter sctp_sf_do_5_1D_ce() and before we actually get to the point where we process (and subsequently free) a non-NULL chunk->auth_chunk, we could hit the "goto nomem_init" path from an error condition and thus leave the cloned skb around w/o freeing it. The fix is to centrally free such clones in sctp_chunk_destroy() handler that is invoked from sctp_chunk_free() after all refs have dropped; and also move both kfree_skb(chunk->auth_chunk) there, so that chunk->auth_chunk is either NULL (since sctp_chunkify() allocs new chunks through kmem_cache_zalloc()) or non-NULL with a valid skb pointer. chunk->skb and chunk->auth_chunk are the only skbs in the sctp_chunk structure that need to be handeled. While at it, we should use consume_skb() for both. It is the same as dev_kfree_skb() but more appropriately named as we are not a device but a protocol. Also, this effectively replaces the kfree_skb() from both invocations into consume_skb(). Functions are the same only that kfree_skb() assumes that the frame was being dropped after a failure (e.g. for tools like drop monitor), usage of consume_skb() seems more appropriate in function sctp_chunk_destroy() though. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann Cc: Vlad Yasevich Cc: Neil Horman Acked-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/sctp/sm_make_chunk.c | 4 ++-- net/sctp/sm_statefuns.c | 4 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index feedee7..22d4ed8 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1356,8 +1356,8 @@ static void sctp_chunk_destroy(struct sctp_chunk *chunk) BUG_ON(!list_empty(>list)); list_del_init(>transmitted_list); - /* Free the chunk skb data and the SCTP_chunk stub itself. */ - dev_kfree_skb(chunk->skb); + consume_skb(chunk->skb); + consume_skb(chunk->auth_chunk); SCTP_DBG_OBJCNT_DEC(chunk); kmem_cache_free(sctp_chunk_cachep, chunk); diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index d43002b..6da0171 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -762,10 +762,6 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const struct sctp_endpoint *ep, auth.transport = chunk->transport; ret = sctp_sf_authenticate(ep, new_asoc, type, ); - - /* We can now safely free the auth_chunk clone */ - kfree_skb(chunk->auth_chunk); - if (ret != SCTP_IERROR_NO_ERROR) { sctp_association_free(new_asoc); return sctp_sf_pdiscard(ep, asoc, type, arg, commands); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
[ 091/143] net: drop_monitor: fix the value of maxattr
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Changli Gao [ Upstream commit d323e92cc3f4edd943610557c9ea1bb4bb5056e8 ] maxattr in genl_family should be used to save the max attribute type, but not the max command type. Drop monitor doesn't support any attributes, so we should leave it as zero. Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/core/drop_monitor.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 0a113f2..e65fa2f 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -63,7 +63,6 @@ static struct genl_family net_drop_monitor_family = { .hdrsize= 0, .name = "NET_DM", .version= 2, - .maxattr= NET_DM_CMD_MAX, }; static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 028/143] net: do not call sock_put() on TIMEWAIT sockets
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Eric Dumazet [ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ] commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets. We should instead use inet_twsk_put() Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv4/inet_hashtables.c | 2 +- net/ipv6/inet6_hashtables.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index d717267..03fd04a 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -247,7 +247,7 @@ begintw: } if (unlikely(!INET_TW_MATCH(sk, net, hash, acookie, saddr, daddr, ports, dif))) { - sock_put(sk); + inet_twsk_put(inet_twsk(sk)); goto begintw; } goto out; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 093e9b2..93765577 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -104,7 +104,7 @@ begintw: goto out; } if (!INET6_TW_MATCH(sk, net, hash, saddr, daddr, ports, dif)) { - sock_put(sk); + inet_twsk_put(inet_twsk(sk)); goto begintw; } goto out; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 019/143] HID: validate HID report id size
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 43622021d2e2b82ea03d883926605bdd0525e1d1 upstream The "Report ID" field of a HID report is used to build indexes of reports. The kernel's index of these is limited to 256 entries, so any malicious device that sets a Report ID greater than 255 will trigger memory corruption on the host: [ 1347.156239] BUG: unable to handle kernel paging request at 88094958a878 [ 1347.156261] IP: [] hid_register_report+0x2a/0x8b CVE-2013-2888 Signed-off-by: Kees Cook Cc: sta...@kernel.org Signed-off-by: Jiri Kosina [jmm: backport to 2.6.32] Signed-off-by: Willy Tarreau --- drivers/hid/hid-core.c | 10 +++--- include/linux/hid.h| 4 +++- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index 11f8069..e40e3c4 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -58,6 +58,8 @@ static struct hid_report *hid_register_report(struct hid_device *device, unsigne struct hid_report_enum *report_enum = device->report_enum + type; struct hid_report *report; + if (id >= HID_MAX_IDS) + return NULL; if (report_enum->report_id_hash[id]) return report_enum->report_id_hash[id]; @@ -368,8 +370,10 @@ static int hid_parser_global(struct hid_parser *parser, struct hid_item *item) case HID_GLOBAL_ITEM_TAG_REPORT_ID: parser->global.report_id = item_udata(item); - if (parser->global.report_id == 0) { - dbg_hid("report_id 0 is invalid\n"); + if (parser->global.report_id == 0 || + parser->global.report_id >= HID_MAX_IDS) { + dbg_hid("report_id %u is invalid\n", + parser->global.report_id); return -1; } return 0; @@ -545,7 +549,7 @@ static void hid_device_release(struct device *dev) for (i = 0; i < HID_REPORT_TYPES; i++) { struct hid_report_enum *report_enum = device->report_enum + i; - for (j = 0; j < 256; j++) { + for (j = 0; j < HID_MAX_IDS; j++) { struct hid_report *report = report_enum->report_id_hash[j]; if (report) hid_free_report(report); diff --git a/include/linux/hid.h b/include/linux/hid.h index 8709365..481080d 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -410,10 +410,12 @@ struct hid_report { struct hid_device *device; /* associated device */ }; +#define HID_MAX_IDS 256 + struct hid_report_enum { unsigned numbered; struct list_head report_list; - struct hid_report *report_id_hash[256]; + struct hid_report *report_id_hash[HID_MAX_IDS]; }; #define HID_REPORT_TYPES 3 -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 139/143] SELinux: Fix kernel BUG on empty security contexts.
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Stephen Smalley commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream Setting an empty security context (length=0) on a file will lead to incorrectly dereferencing the type and other fields of the security context structure, yielding a kernel BUG. As a zero-length security context is never valid, just reject all such security contexts whether coming from userspace via setxattr or coming from the filesystem upon a getxattr request by SELinux. Setting a security context value (empty or otherwise) unknown to SELinux in the first place is only possible for a root process (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only if the corresponding SELinux mac_admin permission is also granted to the domain by policy. In Fedora policies, this is only allowed for specific domains such as livecd for setting down security contexts that are not defined in the build host policy. Reproducer: su setenforce 0 touch foo setfattr -n security.selinux foo Caveat: Relabeling or removing foo after doing the above may not be possible without booting with SELinux disabled. Any subsequent access to foo after doing the above will also trigger the BUG. BUG output from Matthew Thode: [ 473.893141] [ cut here ] [ 473.962110] kernel BUG at security/selinux/ss/services.c:654! [ 473.995314] invalid opcode: [#6] SMP [ 474.027196] Modules linked in: [ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I 3.13.0-grsec #1 [ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0 07/29/10 [ 474.149768] task: 8805f50cd010 ti: 8805f50cd488 task.ti: 8805f50cd488 [ 474.183707] RIP: 0010:[] [] context_struct_compute_av+0xce/0x308 [ 474.219954] RSP: 0018:8805c0ac3c38 EFLAGS: 00010246 [ 474.252253] RAX: RBX: 8805c0ac3d94 RCX: 0100 [ 474.287018] RDX: 8805e8aac000 RSI: RDI: 8805e8aaa000 [ 474.321199] RBP: 8805c0ac3cb8 R08: 0010 R09: 0006 [ 474.357446] R10: R11: 8805c567a000 R12: 0006 [ 474.419191] R13: 8805c2b74e88 R14: 01da R15: [ 474.453816] FS: 7f2e75220800() GS:88061fc0() knlGS: [ 474.489254] CS: 0010 DS: ES: CR0: 80050033 [ 474.522215] CR2: 7f2e74716090 CR3: 0005c085e000 CR4: 000207f0 [ 474.556058] Stack: [ 474.584325] 8805c0ac3c98 811b549b 8805c0ac3c98 8805f1190a40 [ 474.618913] 8805a6202f08 8805c2b74e88 00068800d0464990 8805e8aac860 [ 474.653955] 8805c0ac3cb8 000700068113833a 880606c75060 8805c0ac3d94 [ 474.690461] Call Trace: [ 474.723779] [] ? lookup_fast+0x1cd/0x22a [ 474.778049] [] security_compute_av+0xf4/0x20b [ 474.811398] [] avc_compute_av+0x2a/0x179 [ 474.843813] [] avc_has_perm+0x45/0xf4 [ 474.875694] [] inode_has_perm+0x2a/0x31 [ 474.907370] [] selinux_inode_getattr+0x3c/0x3e [ 474.938726] [] security_inode_getattr+0x1b/0x22 [ 474.970036] [] vfs_getattr+0x19/0x2d [ 475.000618] [] vfs_fstatat+0x54/0x91 [ 475.030402] [] vfs_lstat+0x19/0x1b [ 475.061097] [] SyS_newlstat+0x15/0x30 [ 475.094595] [] ? __audit_syscall_entry+0xa1/0xc3 [ 475.148405] [] system_call_fastpath+0x16/0x1b [ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48 8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7 75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8 [ 475.255884] RIP [] context_struct_compute_av+0xce/0x308 [ 475.296120] RSP [ 475.328734] ---[ end trace f076482e9d754adc ]--- Reported-by: Matthew Thode Signed-off-by: Stephen Smalley Cc: sta...@vger.kernel.org Signed-off-by: Paul Moore Signed-off-by: Willy Tarreau --- security/selinux/ss/services.c | 4 1 file changed, 4 insertions(+) diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index ff17820..dee7177 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -1074,6 +1074,10 @@ static int security_context_to_sid_core(const char *scontext, u32 scontext_len, struct context context; int rc = 0; + /* An empty security context is never valid. */ + if (!scontext_len) + return -EINVAL; + if (!ss_initialized) { int i; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 142/143] floppy: ignore kernel-only members in FDRAWCMD ioctl input
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Matthew Daley Always clear out these floppy_raw_cmd struct members after copying the entire structure from userspace so that the in-kernel version is always valid and never left in an interdeterminate state. Signed-off-by: Matthew Daley Signed-off-by: Linus Torvalds (cherry picked from commit ef87dbe7614341c2e7bfe8d32fcb7028cc97442c) [wt: be careful in 2.6.32 we still have the ugly macros everywhere] Signed-off-by: Willy Tarreau --- drivers/block/floppy.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 5c01f74..19d45e6 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3209,9 +3209,12 @@ static inline int raw_cmd_copyin(int cmd, char __user *param, if (!ptr) return -ENOMEM; *rcmd = ptr; - COPYIN(*ptr); + ret = copy_from_user(ptr, (void __user *)param, sizeof(*ptr)); ptr->next = NULL; ptr->buffer_length = 0; + ptr->kernel_data = NULL; + if (ret) + return -EFAULT; param += sizeof(struct floppy_raw_cmd); if (ptr->cmd_count > 33) /* the command may now also take up the space -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 043/143] dm9601: fix IFF_ALLMULTI handling
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Peter Korsgaard [ Upstream commit bf0ea6380724beb64f27a722dfc4b0edabff816e ] Pass-all-multicast is controlled by bit 3 in RX control, not bit 2 (pass undersized frames). Reported-by: Joseph Chang Signed-off-by: Peter Korsgaard Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- drivers/net/usb/dm9601.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index 9a6eede..498681a 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -382,7 +382,7 @@ static void dm9601_set_multicast(struct net_device *net) if (net->flags & IFF_PROMISC) { rx_ctl |= 0x02; } else if (net->flags & IFF_ALLMULTI || net->mc_count > DM_MAX_MCAST) { - rx_ctl |= 0x04; + rx_ctl |= 0x08; } else if (net->mc_count) { struct dev_mc_list *mc_list = net->mc_list; int i; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 136/143] qeth: avoid buffer overflow in snmp ioctl
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Ursula Braun commit 6fb392b1a63ae36c31f62bc3fc8630b49d602b62 upstream Check user-defined length in snmp ioctl request and allow request only if it fits into a qeth command buffer. Signed-off-by: Ursula Braun Signed-off-by: Frank Blaschka Reviewed-by: Heiko Carstens Reported-by: Nico Golde Reported-by: Fabian Yamaguchi Cc: Signed-off-by: David S. Miller [jmm: backport 2.6.32] Signed-off-by: Willy Tarreau --- drivers/s390/net/qeth_core_main.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index c4a42d9..29afd6c 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -3557,7 +3557,7 @@ int qeth_snmp_command(struct qeth_card *card, char __user *udata) struct qeth_cmd_buffer *iob; struct qeth_ipa_cmd *cmd; struct qeth_snmp_ureq *ureq; - int req_len; + unsigned int req_len; struct qeth_arp_query_info qinfo = {0, }; int rc = 0; @@ -3573,6 +3573,10 @@ int qeth_snmp_command(struct qeth_card *card, char __user *udata) /* skip 4 bytes (data_len struct member) to get req_len */ if (copy_from_user(_len, udata + sizeof(int), sizeof(int))) return -EFAULT; + if (req_len > (QETH_BUFSIZE - IPA_PDU_HEADER_SIZE - + sizeof(struct qeth_ipacmd_hdr) - + sizeof(struct qeth_ipacmd_setadpparms_hdr))) + return -EINVAL; ureq = kmalloc(req_len+sizeof(struct qeth_snmp_ureq_hdr), GFP_KERNEL); if (!ureq) { QETH_DBF_TEXT(TRACE, 2, "snmpnome"); -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 072/143] tipc: fix lockdep warning during bearer initialization
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Ying Xue [ Upstream commit 4225a398c1352a7a5c14dc07277cb5cc4473983b ] When the lockdep validator is enabled, it will report the below warning when we enable a TIPC bearer: [ INFO: possible irq lock inversion dependency detected ] - Possible interrupt unsafe locking scenario: CPU0CPU1 lock(ptype_lock); local_irq_disable(); lock(tipc_net_lock); lock(ptype_lock); lock(tipc_net_lock); *** DEADLOCK *** the shortest dependencies between 2nd lock and 1st lock: -> (ptype_lock){+.+...} ops: 10 { [...] SOFTIRQ-ON-W at: [] __lock_acquire+0x528/0x13e0 [] lock_acquire+0x90/0x100 [] _raw_spin_lock+0x38/0x50 [] dev_add_pack+0x3a/0x60 [] arp_init+0x1a/0x48 [] inet_init+0x181/0x27e [] do_one_initcall+0x34/0x170 [] kernel_init+0x110/0x1b2 [] kernel_thread_helper+0x6/0x10 [...] ... key at: [] ptype_lock+0x10/0x20 ... acquired at: [] lock_acquire+0x90/0x100 [] _raw_spin_lock+0x38/0x50 [] dev_add_pack+0x3a/0x60 [] enable_bearer+0xf2/0x140 [tipc] [] tipc_enable_bearer+0x1ba/0x450 [tipc] [] tipc_cfg_do_cmd+0x5c4/0x830 [tipc] [] handle_cmd+0x42/0xd0 [tipc] [] genl_rcv_msg+0x232/0x280 [] netlink_rcv_skb+0x86/0xb0 [] genl_rcv+0x1c/0x30 [] netlink_unicast+0x174/0x1f0 [] netlink_sendmsg+0x1eb/0x2d0 [] sock_aio_write+0x161/0x170 [] do_sync_write+0xac/0xf0 [] vfs_write+0x156/0x170 [] sys_write+0x42/0x70 [] sysenter_do_call+0x12/0x38 [...] } -> (tipc_net_lock){+..-..} ops: 4 { [...] IN-SOFTIRQ-R at: [] __lock_acquire+0x64a/0x13e0 [] lock_acquire+0x90/0x100 [] _raw_read_lock_bh+0x3d/0x50 [] tipc_recv_msg+0x1d/0x830 [tipc] [] recv_msg+0x3f/0x50 [tipc] [] __netif_receive_skb+0x22a/0x590 [] netif_receive_skb+0x2b/0xf0 [] pcnet32_poll+0x292/0x780 [] net_rx_action+0xfa/0x1e0 [] __do_softirq+0xae/0x1e0 [...] } >From the log, we can see three different call chains between CPU0 and CPU1: Time 0 on CPU0: kernel_init()->inet_init()->dev_add_pack() At time 0, the ptype_lock is held by CPU0 in dev_add_pack(); Time 1 on CPU1: tipc_enable_bearer()->enable_bearer()->dev_add_pack() At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then wants to take ptype_lock to register TIPC protocol handler into the networking stack. But the ptype_lock has been taken by dev_add_pack() on CPU0, so at this time the dev_add_pack() running on CPU1 has to be busy looping. Time 2 on CPU0: netif_receive_skb()->recv_msg()->tipc_recv_msg() At time 2, an incoming TIPC packet arrives at CPU0, hence tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants to hold tipc_net_lock. At the moment, below scenario happens: On CPU0, below is our sequence of taking locks: lock(ptype_lock)->lock(tipc_net_lock) On CPU1, our sequence of taking locks looks like: lock(tipc_net_lock)->lock(ptype_lock) Obviously deadlock may happen in this case. But please note the deadlock possibly doesn't occur at all when the first TIPC bearer is enabled. Before enable_bearer() -- running on CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e. recv_msg()) is not registered successfully via dev_add_pack(), so the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC message comes to CPU0. But when the second TIPC bearer is registered, the deadlock can perhaps really happen. To fix it, we will push the work of registering TIPC protocol handler into workqueue context. After the change, both paths taking ptype_lock are always in process contexts, thus, the deadlock should never occur. Signed-off-by: Ying Xue Signed-off-by: Jon Maloy Signed-off-by: Paul Gortmaker Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/tipc/eth_media.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 524ba56..22453a8 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -56,6 +56,7 @@ struct eth_bearer { struct tipc_bearer *bearer; struct net_device *dev; struct packet_type tipc_packet_type; + struct work_struct setup; }; static struct eth_bearer eth_bearers[MAX_ETH_BEARERS]; @@ -122,6 +123,17 @@ static int recv_msg(struct sk_buff *buf, struct net_device *dev, } /** + * setup_bearer -
[ 022/143] HID: LG: validate HID output report details
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 0fb6bd06e06792469acc15bbe427361b56ada528 upstream A HID device could send a malicious output report that would cause the lg, lg3, and lg4 HID drivers to write beyond the output report allocation during an event, causing a heap overflow: [ 325.245240] usb 1-1: New USB device found, idVendor=046d, idProduct=c287 ... [ 414.518960] BUG kmalloc-4096 (Not tainted): Redzone overwritten Additionally, while lg2 did correctly validate the report details, it was cleaned up and shortened. CVE-2013-2893 Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina [jmm: backported to 2.6.32] Signed-off-by: Willy Tarreau --- drivers/hid/hid-lg2ff.c | 19 +++ drivers/hid/hid-lgff.c | 17 ++--- 2 files changed, 5 insertions(+), 31 deletions(-) diff --git a/drivers/hid/hid-lg2ff.c b/drivers/hid/hid-lg2ff.c index 4e6dc6e..a260a8c 100644 --- a/drivers/hid/hid-lg2ff.c +++ b/drivers/hid/hid-lg2ff.c @@ -65,26 +65,13 @@ int lg2ff_init(struct hid_device *hid) struct hid_report *report; struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list); - struct list_head *report_list = - >report_enum[HID_OUTPUT_REPORT].report_list; struct input_dev *dev = hidinput->input; int error; - if (list_empty(report_list)) { - dev_err(>dev, "no output report found\n"); + /* Check that the report looks ok */ + report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7); + if (!report) return -ENODEV; - } - - report = list_entry(report_list->next, struct hid_report, list); - - if (report->maxfield < 1) { - dev_err(>dev, "output report is empty\n"); - return -ENODEV; - } - if (report->field[0]->report_count < 7) { - dev_err(>dev, "not enough values in the field\n"); - return -ENODEV; - } lg2ff = kmalloc(sizeof(struct lg2ff_device), GFP_KERNEL); if (!lg2ff) diff --git a/drivers/hid/hid-lgff.c b/drivers/hid/hid-lgff.c index 987abeb..df26abb 100644 --- a/drivers/hid/hid-lgff.c +++ b/drivers/hid/hid-lgff.c @@ -135,27 +135,14 @@ static void hid_lgff_set_autocenter(struct input_dev *dev, u16 magnitude) int lgff_init(struct hid_device* hid) { struct hid_input *hidinput = list_entry(hid->inputs.next, struct hid_input, list); - struct list_head *report_list = >report_enum[HID_OUTPUT_REPORT].report_list; struct input_dev *dev = hidinput->input; - struct hid_report *report; - struct hid_field *field; const signed short *ff_bits = ff_joystick; int error; int i; - /* Find the report to use */ - if (list_empty(report_list)) { - err_hid("No output report found"); - return -1; - } - /* Check that the report looks ok */ - report = list_entry(report_list->next, struct hid_report, list); - field = report->field[0]; - if (!field) { - err_hid("NULL field"); - return -1; - } + if (!hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7)) + return -ENODEV; for (i = 0; i < ARRAY_SIZE(devices); i++) { if (dev->id.vendor == devices[i].idVendor && -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 041/143] ipv6 mcast: use in6_dev_put in timer handlers instead of
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- __in6_dev_put From: Salam Noureddine [ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ] It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/mcast.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index f9fcf69..99ae9e3 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -2208,7 +2208,7 @@ static void mld_gq_timer_expire(unsigned long data) idev->mc_gq_running = 0; mld_send_report(idev, NULL); - __in6_dev_put(idev); + in6_dev_put(idev); } static void mld_ifc_timer_expire(unsigned long data) @@ -2221,7 +2221,7 @@ static void mld_ifc_timer_expire(unsigned long data) if (idev->mc_ifc_count) mld_ifc_start_timer(idev, idev->mc_maxdelay); } - __in6_dev_put(idev); + in6_dev_put(idev); } static void mld_ifc_event(struct inet6_dev *idev) -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 070/143] ipv6: Dont depend on per socket memory for neighbour discovery
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- messages From: Thomas Graf [ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ] Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf Cc: Eric Dumazet Cc: Hannes Frederic Sowa Cc: Stephen Warren Cc: Fabio Estevam Tested-by: Fabio Estevam Tested-by: Stephen Warren Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau --- net/ipv6/ndisc.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index f74e4e2..752da21 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -449,7 +449,6 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, struct sk_buff *skb; struct icmp6hdr *hdr; int len; - int err; u8 *opt; if (!dev->addr_len) @@ -459,14 +458,12 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, if (llinfo) len += ndisc_opt_addr_space(dev); - skb = sock_alloc_send_skb(sk, - (MAX_HEADER + sizeof(struct ipv6hdr) + - len + LL_ALLOCATED_SPACE(dev)), - 1, ); + skb = alloc_skb((MAX_HEADER + sizeof(struct ipv6hdr) + +len + LL_ALLOCATED_SPACE(dev)), GFP_ATOMIC); if (!skb) { ND_PRINTK0(KERN_ERR - "ICMPv6 ND: %s() failed to allocate an skb, err=%d.\n", - __func__, err); + "ICMPv6 ND: %s() failed to allocate an skb.\n", + __func__); return NULL; } @@ -494,6 +491,11 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, csum_partial(hdr, len, 0)); + /* Manually assign socket ownership as we avoid calling +* sock_alloc_send_pskb() to bypass wmem buffer limits +*/ + skb_set_owner_w(skb, sk); + return skb; } -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 131/143] dm snapshot: fix data corruption
2.6.32-longterm review patch. If anyone has any objections, please let me know. -- From: Mikulas Patocka CVE-2013-4299 BugLink: http://bugs.launchpad.net/bugs/1241769 This patch fixes a particular type of data corruption that has been encountered when loading a snapshot's metadata from disk. When we allocate a new chunk in persistent_prepare, we increment ps->next_free and we make sure that it doesn't point to a metadata area by further incrementing it if necessary. When we load metadata from disk on device activation, ps->next_free is positioned after the last used data chunk. However, if this last used data chunk is followed by a metadata area, ps->next_free is positioned erroneously to the metadata area. A newly-allocated chunk is placed at the same location as the metadata area, resulting in data or metadata corruption. This patch changes the code so that ps->next_free skips the metadata area when metadata are loaded in function read_exceptions. The patch also moves a piece of code from persistent_prepare_exception to a separate function skip_metadata to avoid code duplication. CVE-2013-4299 Signed-off-by: Mikulas Patocka Cc: sta...@vger.kernel.org Cc: Mike Snitzer Signed-off-by: Alasdair G Kergon (back ported from commit e9c6a182649f4259db704ae15a91ac820e63b0ca) Signed-off-by: Luis Henriques Acked-by: Stefan Bader Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau --- drivers/md/dm-snap-persistent.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c index 0c74642..97c3f06 100644 --- a/drivers/md/dm-snap-persistent.c +++ b/drivers/md/dm-snap-persistent.c @@ -252,6 +252,14 @@ static chunk_t area_location(struct pstore *ps, chunk_t area) return 1 + ((ps->exceptions_per_area + 1) * area); } +static void skip_metadata(struct pstore *ps) +{ + uint32_t stride = ps->exceptions_per_area + 1; + chunk_t next_free = ps->next_free; + if (sector_div(next_free, stride) == 1) + ps->next_free++; +} + /* * Read or write a metadata area. Remembering to skip the first * chunk which holds the header. @@ -481,6 +489,8 @@ static int read_exceptions(struct pstore *ps, ps->current_area--; + skip_metadata(ps); + return 0; } @@ -587,8 +597,6 @@ static int persistent_prepare_exception(struct dm_exception_store *store, struct dm_snap_exception *e) { struct pstore *ps = get_info(store); - uint32_t stride; - chunk_t next_free; sector_t size = get_dev_size(store->cow->bdev); /* Is there enough room ? */ @@ -601,10 +609,8 @@ static int persistent_prepare_exception(struct dm_exception_store *store, * Move onto the next free pending, making sure to take * into account the location of the metadata chunks. */ - stride = (ps->exceptions_per_area + 1); - next_free = ++ps->next_free; - if (sector_div(next_free, stride) == 1) - ps->next_free++; + ps->next_free++; + skip_metadata(ps); atomic_inc(>pending_count); return 0; -- 1.7.12.2.21.g234cd45.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/