Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 19:19 -0800, Davidlohr Bueso wrote: > On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote: > > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > > > In futex_wake() there is clearly no point in taking the hb->lock if > > > we know beforehand that there are no tasks to be woken. This comes > > > at the smaller cost of doing some atomic operations to keep track of > > > the list's size. > > > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > > need the serialization from locking, then afaik you can just do a > > "plist_head_empty()" without holding the lock. > > I remember this being the original approach, but after noticing some > strange behavior we quickly decided it wasn't the path. And sure enough, > I just double checked and tried the patch without atomic ops and can see > things being off: one of the futextest performance cases is stuck > blocked on a futex and I couldn't reboot the machine either -- nothing > apparent in dmesg, just not 100% functional. The thing is, we can only > avoid taking the lock only if nobody else is trying to add itself to the > list. In your usage, the worst case scenario is that you detect 0 when locking may have blocked and found a waiter. Correct? In this case, you return 0, instead of 1 (or more). This suggests to me a bug in the futextest testcase. Which test specifically hung up waiting? Futex hangs are almost always bad userspace code (my bad userspace code in this case ;-) -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Specifically, increment the counter when an element is > added to the list, and decrement when it is removed. Of course, if the > counter is 0, then there are no tasks blocked on a futex. Some special > considerations: > > - increment the counter at queue_lock() as we always end up calling > queue_me() which adds the element to the list. Upon any error, > queue_unlock() is called for housekeeping, for which we decrement > to mach the increment done in queue_lock(). ^match > > - decrement the counter at __unqueue_me() to reflect when an element is > removed from the queue for wakeup related purposes. __unqueue_futex (not __unqueue_me) > @@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int > nr_wake, u32 bitset) > goto out; > > hb = hash_futex(); > + /* make sure we really have tasks to wakeup */ Nit, but please use proper sentence formatting for consistency with the rest of the comments in futex.c (most of them anyway). /* Make sure we really have tasks to wake up. */ Now... I'm not thrilled with adding atomics if we don't need to, especially for an optimization since the atomics themselves cause enough problems, especially across a large number of CPUs... I'll respond to Linus's thread though. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] futex: Larger hash table
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > Currently, the futex global hash table suffers from it's fixed, smallish > (for today's standards) size of 256 entries, as well as its lack of NUMA > awareness. Large systems, using many futexes, can be prone to high amounts > of collisions; where these futexes hash to the same bucket and lead to > extra contention on the same hb->lock. Furthermore, cacheline bouncing is a > reality when we have multiple hb->locks residing on the same cacheline and > different futexes hash to adjacent buckets. > > This patch keeps the current static size of 16 entries for small systems, > or otherwise, 256 * ncpus (or larger as we need to round the number to a > power of 2). Note that this number of CPUs accounts for all CPUs that can > ever be available in the system, taking into consideration things like > hotpluging. While we do impose extra overhead at bootup by making the hash > table larger, this is a one time thing, and does not shadow the benefits > of this patch. > > Also, similar to other core kernel components (pid, dcache, tcp), by using > alloc_large_system_hash() we benefit from its NUMA awareness and thus the > table is distributed among the nodes instead of in a single one. We impose > this function's minimum limit of 256 entries, so that in worst case scenarios > or issues, we still end up using the current amount anyways. > > For a custom microbenchmark that pounds on the uaddr hashing -- making the > wait > path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of > futexes, > we can see the following benefits on a 80-core, 8-socket 1Tb server: > > +-+--+--+--+ > | threads | baseline (ops/sec) [insns/cycle] | large hash table (ops/sec) > [insns/cycle] | increase | > +-+--+--+--+ > | 512 | 34429[0.07] | 255274[0.48] > | +641.45% | > | 256 | 65452[0.07] | 443563[0.41] > | +577.69% | > | 128 | 125111 [0.07] | 742613[0.33] > | +493.56% | > | 80 | 203642 [0.09] | 1028147 [0.29] > | +404.87% | > | 64 | 262944 [0.09] | 997300[0.28] > | +279.28% | > | 32 | 642390 [0.24] | 965996[0.27] > | +50.37 | > +-+--+--+--+ > > Cc: Ingo Molnar > Cc: Darren Hart > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Mike Galbraith > Cc: Jeff Mahoney > Cc: Linus Torvalds > Cc: Scott Norton > Cc: Tom Vaden > Cc: Aswin Chandramouleeswaran > Signed-off-by: Waiman Long > Signed-off-by: Jason Low > Signed-off-by: Davidlohr Bueso > --- > kernel/futex.c | 26 +- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 0768c68..5fa9eb0 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -63,6 +63,7 @@ > #include > #include > #include > +#include > > #include > > @@ -70,7 +71,11 @@ > > int __read_mostly futex_cmpxchg_enabled; > > -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8) > +#if CONFIG_BASE_SMALL > +static unsigned long futex_hashsize = 16; > +#else > +static unsigned long futex_hashsize; > +#endif > > /* > * Futex flags used to encode options to functions and preserve them across > @@ -151,7 +156,11 @@ struct futex_hash_bucket { > struct plist_head chain; > }; > > -static struct futex_hash_bucket futex_queues[1< +#if CONFIG_BASE_SMALL > +static struct futex_hash_bucket futex_queues[futex_hashsize]; > +#else > +static struct futex_hash_bucket *futex_queues; > +#endif Something in me squirms at the #if/#else here, but I'll leave that to the tip maintainers to call out if they are so inclined. Same below. Otherwise, looks reasonable to me. Reviewed-by: Darren Hart -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] futex: Misc cleanups
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > From: Jason Low > > - Remove unnecessary head variables. > - Delete unused parameter in queue_unlock(). > > Cc: Ingo Molnar > Cc: Darren Hart > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Mike Galbraith > Cc: Jeff Mahoney > Cc: Linus Torvalds > Cc: Scott Norton > Cc: Tom Vaden > Cc: Aswin Chandramouleeswaran > Cc: Waiman Long > Signed-off-by: Jason Low > Signed-off-by: Davidlohr Bueso My concern was that we saved off head for a reason, but the hb lock is held in these cases, and the call is safe against removal, so I don't see a problem. Thanks. Reviewed-by: Darren Hart -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] futex: Wakeup optimizations
Hi Darren, On Fri, 2013-11-22 at 21:55 -0800, Darren Hart wrote: > On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > > We have been dealing with a customer database workload on large > > 12Tb, 240 core 16 socket NUMA system that exhibits high amounts > > of contention on some of the locks that serialize internal futex > > data structures. This workload specially suffers in the wakeup > > paths, where waiting on the corresponding hb->lock can account for > > up to ~60% of the time. The result of such calls can mostly be > > classified as (i) nothing to wake up and (ii) wakeup large amount > > of tasks. > > With as many cores as you have, have you done any analysis of how > effective the hashing algorithm is, and would more buckets relieve someHi > of the contention ah, I see below that you did. Nice work. > > > Before these patches are applied, we can see this pathological behavior: > > > > 37.12% 826174 xxx [kernel.kallsyms] [k] _raw_spin_lock > > --- _raw_spin_lock > > | > > |--97.14%-- futex_wake > > | do_futex > > | sys_futex > > | system_call_fastpath > > | | > > | |--99.70%-- 0x7f383fbdea1f > > | | yyy > > > > 43.71% 762296 xxx [kernel.kallsyms] [k] _raw_spin_lock > > --- _raw_spin_lock > > | > > |--53.74%-- futex_wake > > | do_futex > > | sys_futex > > | system_call_fastpath > > | | > > | |--99.40%-- 0x7fe7d44a4c05 > > | | zzz > > |--45.90%-- futex_wait_setup > > | futex_wait > > | do_futex > > | sys_futex > > | system_call_fastpath > > | 0x7fe7ba315789 > > | syscall > > > > Sorry to be dense, can you spell out how 60% falls out of these numbers? By adding the respective percentages of futex_wake()*_raw_spin_lock calls. > > > > > With these patches, contention is practically non existent: > > > > 0.10% 49 xxx [kernel.kallsyms] [k] _raw_spin_lock > >--- _raw_spin_lock > > | > > |--76.06%-- futex_wait_setup > > | futex_wait > > | do_futex > > | sys_futex > > | system_call_fastpath > > | | > > | |--99.90%-- 0x7f3165e63789 > > | | syscall| > >... > > |--6.27%-- futex_wake > > | do_futex > > | sys_futex > > | system_call_fastpath > > | | > > | |--54.56%-- 0x7f317fff2c05 > > ... > > > > Patches 1 & 2 are cleanups and micro optimizations. > > > > Patch 3 addresses the well known issue of the global hash table. > > By creating a larger and NUMA aware table, we can reduce the false > > sharing and collisions, thus reducing the chance of different futexes > > using hb->lock. > > > > Patch 4 reduces contention on the corresponding hb->lock by not trying to > > acquire it if there are no blocked tasks in the waitqueue. > > This particularly deals with point (i) above, where we see that it is not > > uncommon for up to 90% of wakeup calls end up returning 0, indicating that > > no > > tasks were woken. > > Can you determine how much benefit comes from 3 and how much additional > benefit comes from 4? While I don't have specific per-patch data, there are indications that the workload mostly deals with a handful of futexes. So its pretty safe to assume that patch 4 is the one with the most benefit for _this_ particular workload. > > > > > Patch 5 resurrects a two year old idea from Peter Zijlstra to delay > > the waking of the blocked tasks to be done without holding the hb->lock: > > https://lkml.org/lkml/2011/9/14/118 > > > > This is useful for locking primitives that can effect multiple wakeups > > per operation and want to avoid the futex's internal spinlock contention by > > delaying the wakeups until we've released the hb->lock. > > This particularly deals with point (ii) above, where we can observe that > > in occasions the wake calls end up waking 125 to 200 waiters in what we > > believe > > are RW locks in the application. > > > > This patchset has also been tested on smaller systems for a variety of > > benchmarks, including java workloads, kernel builds and custom > > bang-the-hell-out-of > > hb locks programs. So far, no functional or performance regressions have > > been seen. > > Furthermore, no issues were found when running the
Re: [PATCH 0/5] futex: Wakeup optimizations
On Fri, 2013-11-22 at 21:55 -0800, Darren Hart wrote: > On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > > This patchset has also been tested on smaller systems for a variety of > > benchmarks, including java workloads, kernel builds and custom > > bang-the-hell-out-of > > hb locks programs. So far, no functional or performance regressions have > > been seen. > > Furthermore, no issues were found when running the different tests in the > > futextest > > suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/ > > Excellent. Would you be able to contribute any of these (C only please) > to the stress test group? FWIW, I plugged this series into an rt kernel (extra raciness) and beat it up a bit on a 64 core box too. Nothing fell out, nor did futextest numbers change outside variance (poor box has 8 whole gig ram, single numa node, so kinda crippled/wimpy, and not good box for benchmarking). What concerned me most about the series was 5/5.. looks like a great idea to me, but the original thread did not have a happy ending. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] futex: Check for pi futex_q only once
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > All wake_futex() callers already verify that the we are not dealing with > a pi futex_q, so we can remove the redundant WARN() check, as this is never > triggered anyway. > > Cc: Ingo Molnar > Cc: Darren Hart > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Mike Galbraith > Cc: Jeff Mahoney > Cc: Linus Torvalds > Cc: Scott Norton > Cc: Tom Vaden > Cc: Aswin Chandramouleeswaran > Cc: Waiman Long > Cc: Jason Low > Signed-off-by: Davidlohr Bueso > --- > kernel/futex.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index e6ffe73..0768c68 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -844,9 +844,6 @@ static void wake_futex(struct futex_q *q) > { > struct task_struct *p = q->task; > > - if (WARN(q->pi_state || q->rt_waiter, "refusing to wake PI futex\n")) > - return; > - This was added deliberately after adding said checks to the callers... admittedly after a very long debug session I didn't ever want to repeat. Sometimes warnings are added to make sure we caught everything and later removed sometimes they are added to make sure nothing new ever breaks this again. Since the failure scenario is non-obvious, unless this is causing some significant performance issues for you, I'd prefer this stays. See commit aa10990e028cac3d5e255711fb9fb47e00700e35 for details. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT] Security subsystem updates for 3.13
On Thu, 21 Nov 2013, Linus Torvalds wrote: > I do see *some* minimal comments on it from George Spelvin on lkml. I'd like to apologize for dropping the ball on that. I started working on it seriously, but with various emergencies, I've been AFK from lkml for the last month. I'm not really thilled with it; I think the fanout of 16 is low for something with its scale ambitions, and the properties expected of the chunked key access method are not documented as clearly as they should be. The way the key is fiddled the put keyring objects in a contiguous range of the trie is a particularly egregious layering violation. But I am convinced that it's been tested and works; my complaints are in the areas of ugliness and efficiency. And it's layered well enough that it can be fixed later without radical sirgery. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] usb: chipidea: add support for USB OTG controller on TI-NSPIRE
From: Daniel Tang The USB controller in TI-NSPIRE calculators are based off either Freescale's USB OTG controller or the USB controller found in the IMX233, both of which are Chipidea compatible. This patch adds a device tree binding for the controller. Signed-off-by: Daniel Tang --- .../devicetree/bindings/usb/ci13xxx-nspire.txt | 17 + drivers/usb/chipidea/Makefile | 1 + drivers/usb/chipidea/ci_hdrc_nspire.c | 74 ++ 3 files changed, 92 insertions(+) create mode 100644 Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt create mode 100644 drivers/usb/chipidea/ci_hdrc_nspire.c diff --git a/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt b/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt new file mode 100644 index 000..5ba8e90 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt @@ -0,0 +1,17 @@ +* TI-Nspire USB OTG Controller + +Required properties: +- compatible: Should be "zevio,nspire-usb" +- reg: Should contain registers location and length +- interrupts: Should contain controller interrupt + +Recommended properies: +- vbus-supply: regulator for vbus + +Examples: + usb0: usb@B000 { + reg = <0xB000 0x1000>; + compatible = "zevio,nspire-usb"; + interrupts = <8>; + vbus-supply = <_reg>; + }; diff --git a/drivers/usb/chipidea/Makefile b/drivers/usb/chipidea/Makefile index a99d980..245ea4d 100644 --- a/drivers/usb/chipidea/Makefile +++ b/drivers/usb/chipidea/Makefile @@ -10,6 +10,7 @@ ci_hdrc-$(CONFIG_USB_CHIPIDEA_DEBUG) += debug.o # Glue/Bridge layers go here obj-$(CONFIG_USB_CHIPIDEA) += ci_hdrc_msm.o +obj-$(CONFIG_USB_CHIPIDEA) += ci_hdrc_nspire.o # PCI doesn't provide stubs, need to check ifneq ($(CONFIG_PCI),) diff --git a/drivers/usb/chipidea/ci_hdrc_nspire.c b/drivers/usb/chipidea/ci_hdrc_nspire.c new file mode 100644 index 000..318761d --- /dev/null +++ b/drivers/usb/chipidea/ci_hdrc_nspire.c @@ -0,0 +1,74 @@ +/* + * Copyright (C) 2013 Daniel Tang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + * + * Based off drivers/usb/chipidea/ci_hdrc_msm.c + * + */ + +#include +#include +#include +#include + +#include "ci.h" + +static struct ci_hdrc_platform_data ci_hdrc_nspire_platdata = { + .name = "ci_hdrc_nspire", + .flags = CI_HDRC_REGS_SHARED, + .capoffset = DEF_CAPOFFSET, +}; + +static int ci_hdrc_nspire_probe(struct platform_device *pdev) +{ + struct platform_device *ci_pdev; + + dev_dbg(>dev, "ci_hdrc_nspire_probe\n"); + + ci_pdev = ci_hdrc_add_device(>dev, + pdev->resource, pdev->num_resources, + _hdrc_nspire_platdata); + + if (IS_ERR(ci_pdev)) { + dev_err(>dev, "ci_hdrc_add_device failed!\n"); + return PTR_ERR(ci_pdev); + } + + platform_set_drvdata(pdev, ci_pdev); + + return 0; +} + +static int ci_hdrc_nspire_remove(struct platform_device *pdev) +{ + struct platform_device *ci_pdev = platform_get_drvdata(pdev); + + ci_hdrc_remove_device(ci_pdev); + + return 0; +} + +static const struct of_device_id ci_hdrc_nspire_dt_ids[] = { + { .compatible = "zevio,nspire-usb", }, + { /* sentinel */ } +}; + +static struct platform_driver ci_hdrc_nspire_driver = { + .probe = ci_hdrc_nspire_probe, + .remove = ci_hdrc_nspire_remove, + .driver = { + .name = "nspire_usb", + .owner = THIS_MODULE, + .of_match_table = ci_hdrc_nspire_dt_ids, + }, +}; + +MODULE_DEVICE_TABLE(of, ci_hdrc_nspire_dt_ids); +module_platform_driver(ci_hdrc_nspire_driver); + +MODULE_ALIAS("platform:nspire_usb"); +MODULE_ALIAS("platform:ci13xxx_nspire"); +MODULE_LICENSE("GPL v2"); -- 1.8.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] gpiolib: fix lookup of platform-mapped GPIOs
A typo resulted in GPIO lookup failing unconditionally. Signed-off-by: Alexandre Courbot Reported-by: Stephen Warren --- drivers/gpio/gpiolib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 980b08d..86caf17 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2367,7 +2367,7 @@ static struct gpio_desc *gpiod_find(struct device *dev, const char *con_id, continue; } - if (chip->ngpio >= p->chip_hwnum) { + if (chip->ngpio <= p->chip_hwnum) { dev_warn(dev, "GPIO chip %s has %d GPIOs\n", chip->label, chip->ngpio); continue; -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] futex: Wakeup optimizations
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > We have been dealing with a customer database workload on large > 12Tb, 240 core 16 socket NUMA system that exhibits high amounts > of contention on some of the locks that serialize internal futex > data structures. This workload specially suffers in the wakeup > paths, where waiting on the corresponding hb->lock can account for > up to ~60% of the time. The result of such calls can mostly be > classified as (i) nothing to wake up and (ii) wakeup large amount > of tasks. With as many cores as you have, have you done any analysis of how effective the hashing algorithm is, and would more buckets relieve some of the contention ah, I see below that you did. Nice work. > Before these patches are applied, we can see this pathological behavior: > > 37.12% 826174 xxx [kernel.kallsyms] [k] _raw_spin_lock > --- _raw_spin_lock > | > |--97.14%-- futex_wake > | do_futex > | sys_futex > | system_call_fastpath > | | > | |--99.70%-- 0x7f383fbdea1f > | | yyy > > 43.71% 762296 xxx [kernel.kallsyms] [k] _raw_spin_lock > --- _raw_spin_lock > | > |--53.74%-- futex_wake > | do_futex > | sys_futex > | system_call_fastpath > | | > | |--99.40%-- 0x7fe7d44a4c05 > | | zzz > |--45.90%-- futex_wait_setup > | futex_wait > | do_futex > | sys_futex > | system_call_fastpath > | 0x7fe7ba315789 > | syscall > Sorry to be dense, can you spell out how 60% falls out of these numbers? > > With these patches, contention is practically non existent: > > 0.10% 49 xxx [kernel.kallsyms] [k] _raw_spin_lock >--- _raw_spin_lock > | > |--76.06%-- futex_wait_setup > | futex_wait > | do_futex > | sys_futex > | system_call_fastpath > | | > | |--99.90%-- 0x7f3165e63789 > | | syscall| >... > |--6.27%-- futex_wake > | do_futex > | sys_futex > | system_call_fastpath > | | > | |--54.56%-- 0x7f317fff2c05 > ... > > Patches 1 & 2 are cleanups and micro optimizations. > > Patch 3 addresses the well known issue of the global hash table. > By creating a larger and NUMA aware table, we can reduce the false > sharing and collisions, thus reducing the chance of different futexes > using hb->lock. > > Patch 4 reduces contention on the corresponding hb->lock by not trying to > acquire it if there are no blocked tasks in the waitqueue. > This particularly deals with point (i) above, where we see that it is not > uncommon for up to 90% of wakeup calls end up returning 0, indicating that no > tasks were woken. Can you determine how much benefit comes from 3 and how much additional benefit comes from 4? > > Patch 5 resurrects a two year old idea from Peter Zijlstra to delay > the waking of the blocked tasks to be done without holding the hb->lock: > https://lkml.org/lkml/2011/9/14/118 > > This is useful for locking primitives that can effect multiple wakeups > per operation and want to avoid the futex's internal spinlock contention by > delaying the wakeups until we've released the hb->lock. > This particularly deals with point (ii) above, where we can observe that > in occasions the wake calls end up waking 125 to 200 waiters in what we > believe > are RW locks in the application. > > This patchset has also been tested on smaller systems for a variety of > benchmarks, including java workloads, kernel builds and custom > bang-the-hell-out-of > hb locks programs. So far, no functional or performance regressions have been > seen. > Furthermore, no issues were found when running the different tests in the > futextest > suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/ Excellent. Would you be able to contribute any of these (C only please) to the stress test group? > > This patchset applies on top of Linus' tree as of v3.13-rc1. > > Special thanks to Scott Norton, Tom Vanden and Mark Ray for help presenting, > debugging and analyzing the data. > > futex: Misc cleanups > futex: Check for pi futex_q only once > futex: Larger hash table > futex: Avoid taking hb lock if nothing to wakeup > sched,futex: Provide delayed wakeup list >
[PATCH] gpiolib: add missing declarations
Add missing declarations and include files to avoid warnings during compilation of include/gpio/driver.h. Signed-off-by: Alexandre Courbot Reported-by: Stephen Warren --- include/linux/gpio/driver.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 82eac61..3ea2cf6 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -2,9 +2,12 @@ #define __LINUX_GPIO_DRIVER_H #include +#include struct device; struct gpio_desc; +struct of_phandle_args; +struct device_node; struct seq_file; /** -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups
Simplify the code and fix race condition seen because attribute files were created after hwmon device registration. Signed-off-by: Guenter Roeck --- Compile tested only; unfortunately I don't have the the necessary hardware. drivers/gpu/drm/radeon/radeon_pm.c | 49 +--- 1 file changed, 12 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index d1385cc..dc75bb6 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -537,8 +537,7 @@ static ssize_t radeon_hwmon_show_temp(struct device *dev, struct device_attribute *attr, char *buf) { - struct drm_device *ddev = dev_get_drvdata(dev); - struct radeon_device *rdev = ddev->dev_private; + struct radeon_device *rdev = dev_get_drvdata(dev); int temp; if (rdev->asic->pm.get_temperature) @@ -566,23 +565,14 @@ static ssize_t radeon_hwmon_show_temp_thresh(struct device *dev, return snprintf(buf, PAGE_SIZE, "%d\n", temp); } -static ssize_t radeon_hwmon_show_name(struct device *dev, - struct device_attribute *attr, - char *buf) -{ - return sprintf(buf, "radeon\n"); -} - static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, radeon_hwmon_show_temp, NULL, 0); static SENSOR_DEVICE_ATTR(temp1_crit, S_IRUGO, radeon_hwmon_show_temp_thresh, NULL, 0); static SENSOR_DEVICE_ATTR(temp1_crit_hyst, S_IRUGO, radeon_hwmon_show_temp_thresh, NULL, 1); -static SENSOR_DEVICE_ATTR(name, S_IRUGO, radeon_hwmon_show_name, NULL, 0); static struct attribute *hwmon_attributes[] = { _dev_attr_temp1_input.dev_attr.attr, _dev_attr_temp1_crit.dev_attr.attr, _dev_attr_temp1_crit_hyst.dev_attr.attr, - _dev_attr_name.dev_attr.attr, NULL }; @@ -607,11 +597,15 @@ static const struct attribute_group hwmon_attrgroup = { .is_visible = hwmon_attributes_visible, }; +static const struct attribute_group *hwmon_groups[] = { + _attrgroup, + NULL +}; + static int radeon_hwmon_init(struct radeon_device *rdev) { int err = 0; - - rdev->pm.int_hwmon_dev = NULL; + struct device *hwmon_dev; switch (rdev->pm.int_thermal_type) { case THERMAL_TYPE_RV6XX: @@ -624,20 +618,13 @@ static int radeon_hwmon_init(struct radeon_device *rdev) case THERMAL_TYPE_KV: if (rdev->asic->pm.get_temperature == NULL) return err; - rdev->pm.int_hwmon_dev = hwmon_device_register(rdev->dev); - if (IS_ERR(rdev->pm.int_hwmon_dev)) { - err = PTR_ERR(rdev->pm.int_hwmon_dev); + hwmon_dev = hwmon_device_register_with_groups(rdev->dev, + "radeon", rdev, + hwmon_groups); + if (IS_ERR(hwmon_dev)) { + err = PTR_ERR(hwmon_dev); dev_err(rdev->dev, "Unable to register hwmon device: %d\n", err); - break; - } - dev_set_drvdata(rdev->pm.int_hwmon_dev, rdev->ddev); - err = sysfs_create_group(>pm.int_hwmon_dev->kobj, -_attrgroup); - if (err) { - dev_err(rdev->dev, - "Unable to create hwmon sysfs file: %d\n", err); - hwmon_device_unregister(rdev->dev); } break; default: @@ -647,14 +634,6 @@ static int radeon_hwmon_init(struct radeon_device *rdev) return err; } -static void radeon_hwmon_fini(struct radeon_device *rdev) -{ - if (rdev->pm.int_hwmon_dev) { - sysfs_remove_group(>pm.int_hwmon_dev->kobj, _attrgroup); - hwmon_device_unregister(rdev->pm.int_hwmon_dev); - } -} - static void radeon_dpm_thermal_work_handler(struct work_struct *work) { struct radeon_device *rdev = @@ -1337,8 +1316,6 @@ static void radeon_pm_fini_old(struct radeon_device *rdev) if (rdev->pm.power_state) kfree(rdev->pm.power_state); - - radeon_hwmon_fini(rdev); } static void radeon_pm_fini_dpm(struct radeon_device *rdev) @@ -1358,8 +1335,6 @@ static void radeon_pm_fini_dpm(struct radeon_device *rdev) if (rdev->pm.power_state) kfree(rdev->pm.power_state); - - radeon_hwmon_fini(rdev); } void radeon_pm_fini(struct radeon_device *rdev) -- 1.7.9.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
[PATCH 2/2] drm/nouveau/hwmon: Convert to use devm_hwmon_device_register_with_groups
Simplify the code and resolve race conditions seen because attribute files are created after hwmon device registration, and removed before hwmon device removal. Signed-off-by: Guenter Roeck --- Compile tested only; unfortunately I don't have the the necessary hardware. drivers/gpu/drm/nouveau/nouveau_hwmon.c | 243 +++ drivers/gpu/drm/nouveau/nouveau_hwmon.h | 11 -- 2 files changed, 84 insertions(+), 170 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_hwmon.c b/drivers/gpu/drm/nouveau/nouveau_hwmon.c index 38a4db5..d4144b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_hwmon.c +++ b/drivers/gpu/drm/nouveau/nouveau_hwmon.c @@ -42,9 +42,7 @@ static ssize_t nouveau_hwmon_show_temp(struct device *d, struct device_attribute *a, char *buf) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); int temp = therm->temp_get(therm); if (temp < 0) @@ -68,9 +66,7 @@ static ssize_t nouveau_hwmon_temp1_auto_point1_temp(struct device *d, struct device_attribute *a, char *buf) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); return snprintf(buf, PAGE_SIZE, "%d\n", therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_FAN_BOOST) * 1000); @@ -80,9 +76,7 @@ nouveau_hwmon_set_temp1_auto_point1_temp(struct device *d, struct device_attribute *a, const char *buf, size_t count) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); long value; if (kstrtol(buf, 10, ) == -EINVAL) @@ -101,9 +95,7 @@ static ssize_t nouveau_hwmon_temp1_auto_point1_temp_hyst(struct device *d, struct device_attribute *a, char *buf) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); return snprintf(buf, PAGE_SIZE, "%d\n", therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_FAN_BOOST_HYST) * 1000); @@ -113,9 +105,7 @@ nouveau_hwmon_set_temp1_auto_point1_temp_hyst(struct device *d, struct device_attribute *a, const char *buf, size_t count) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); long value; if (kstrtol(buf, 10, ) == -EINVAL) @@ -133,9 +123,7 @@ static SENSOR_DEVICE_ATTR(temp1_auto_point1_temp_hyst, S_IRUGO | S_IWUSR, static ssize_t nouveau_hwmon_max_temp(struct device *d, struct device_attribute *a, char *buf) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); return snprintf(buf, PAGE_SIZE, "%d\n", therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_DOWN_CLK) * 1000); @@ -144,9 +132,7 @@ static ssize_t nouveau_hwmon_set_max_temp(struct device *d, struct device_attribute *a, const char *buf, size_t count) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); long value; if (kstrtol(buf, 10, ) == -EINVAL) @@ -164,9 +150,7 @@ static ssize_t nouveau_hwmon_max_temp_hyst(struct device *d, struct device_attribute *a, char *buf) { - struct drm_device *dev = dev_get_drvdata(d); - struct nouveau_drm *drm = nouveau_drm(dev); - struct nouveau_therm *therm = nouveau_therm(drm->device); + struct nouveau_therm *therm = dev_get_drvdata(d); return snprintf(buf, PAGE_SIZE, "%d\n", therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_DOWN_CLK_HYST) * 1000); @@ -175,9 +159,7 @@ static ssize_t nouveau_hwmon_set_max_temp_hyst(struct device *d, struct device_attribute *a, const char *buf, size_t count) { - struct drm_device *dev =
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 21:40 -0800, Darren Hart wrote: > On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > > In futex_wake() there is clearly no point in taking the hb->lock if > > we know beforehand that there are no tasks to be woken. This comes > > at the smaller cost of doing some atomic operations to keep track of > > the list's size. Specifically, increment the counter when an element is > > added to the list, and decrement when it is removed. Of course, if the > > counter is 0, then there are no tasks blocked on a futex. Some special > > considerations: > > > > - increment the counter at queue_lock() as we always end up calling > > queue_me() which adds the element to the list. Upon any error, > > queue_unlock() is called for housekeeping, for which we decrement > > to mach the increment done in queue_lock(). > > > > - decrement the counter at __unqueue_me() to reflect when an element is > > removed from the queue for wakeup related purposes. > > What is the problem you are trying to solve here? Apologies, too quick on the trigger. I see plenty of detail in 0/5. Will spend some time reviewing that. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Specifically, increment the counter when an element is > added to the list, and decrement when it is removed. Of course, if the > counter is 0, then there are no tasks blocked on a futex. Some special > considerations: > > - increment the counter at queue_lock() as we always end up calling > queue_me() which adds the element to the list. Upon any error, > queue_unlock() is called for housekeeping, for which we decrement > to mach the increment done in queue_lock(). > > - decrement the counter at __unqueue_me() to reflect when an element is > removed from the queue for wakeup related purposes. What is the problem you are trying to solve here? The fast-path in futexes is not calling into the kernel at all, if you've made the syscall, you've already lost. What kind of improvement are you seeing by adding this optimization? Is it worth the introduction of atomic operations into a file known to the state of California to increase the risk of liver failure? -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/7] ARM: tegra: add gpiod_lookup table for paz00
On Sat, Nov 23, 2013 at 3:40 AM, Stephen Warren wrote: > On 11/22/2013 05:14 AM, Mika Westerberg wrote: >> From: Heikki Krogerus >> >> This makes it possible to request the gpio descriptors in >> rfkill-gpio driver regardless of the platform. > > Patches 1-3, > Tested-by: Stephen Warren > > All the testing I did was to "cat /sys/kernel/debug/gpios" to make sure > those two GPIOs had been correctly acquired by the driver, and that they > changed state as expected via the command-line "rfkill {un,}block" commands. > > However, please note that I had to apply a couple fixes to the gpiolib > core to get this working: > > 1) > > To solve the following build warning: > >> In file included from arch/arm/mach-tegra/board-paz00.c:21:0: >> include/linux/gpio/driver.h:102:17: warning: ‘struct of_phandle_args’ >> declared inside parameter list [enabled by default] >> include/linux/gpio/driver.h:102:17: warning: its scope is only this >> definition or declaration, which is probably not what you want [enabled by >> default] > > I applied: > > >> diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h >> index 82eac610ce1a..5133cf29803a 100644 >> --- a/include/linux/gpio/driver.h >> +++ b/include/linux/gpio/driver.h >> @@ -5,6 +5,7 @@ >> >> struct device; >> struct gpio_desc; >> +struct of_phandle_args; >> struct seq_file; >> >> /** > > 2) > > In order to get the GPIO lookups from the rfkill driver working, I applied: > >> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c >> index ac53a9593662..b73c39f99858 100644 >> --- a/drivers/gpio/gpiolib.c >> +++ b/drivers/gpio/gpiolib.c >> @@ -2368,7 +2368,7 @@ static struct gpio_desc *gpiod_find(struct device >> *dev, const char *con_id, >> continue; >> } >> >> - if (chip->ngpio >= p->chip_hwnum) { >> + if (chip->ngpio <= p->chip_hwnum) { >> dev_warn(dev, "GPIO chip %s has %d GPIOs\n", >>chip->label, chip->ngpio); >> continue; Mmm. I guess should be blamed for this one. I haven't tested GPIO platform lookup in a while and this seems to have slipped in at some point. :( I will send fixup patches to Linus for both issues, thanks for pointing them out. Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:sched/urgent] sched: Check sched_domain before computing group power
On Fri, Nov 22, 2013 at 4:07 AM, Peter Zijlstra wrote: > On Thu, Nov 21, 2013 at 09:22:24AM -0800, Yinghai Lu wrote: >> On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra wrote: >> >> >> >> This one seems fix NULL reference in compute_group_power. >> >> >> >> but get following on current Linus tree plus tip/sched/urgent. >> >> >> >> divide error: [#1] SMP >> >> [ 28.190477] Modules linked in: >> >> [ 28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted >> >> 3.12.0-yh-10487-g4b94e59-dirty #2044 >> >> [ 28.210488] Hardware name: Oracle Corporation Sun Fire >> >> [ 28.229877] task: 88ff25205140 ti: 88ff2520a000 task.ti: >> >> 88ff2520a000 >> >> [ 28.236139] RIP: 0010:[] [] >> >> find_busiest_group+0x2b4/0x8a0 >> > >> > Hurmph.. what kind of hardware is that? and is there anything funny you >> > do to make it do this? >> >> intel nehanem-ex or westmere-ex 8 sockets system. >> >> I tried without my local patches, the problem is still there. > > And I suppose a kernel before > > 863bffc80898 ("sched/fair: Fix group power_orig computation") > > work fine, eh? > > I'll further assume that your RIP points to: > > sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr; > > indicating that sds.total_pwr := 0. > > update_sd_lb_stats() computes it like: > > sds->total_pwr += sgs->group_power; > > which comes out of update_sg_lb_stats() like: > > sgs->group_power = group->sgp->power; > > Which we compute in update_group_power() similarly to how we did before > 863bffc80898. > > Which leaves me a bit puzzled. Hi, for linus tree i need to revert commit-863bffc. commit-863bffc for linus tree + sched/urgent, I need to revert commit-42eb088 commit-9abf24d commit-863bffc . If only revert commit-42eb088, still have problem. if only revert commit-9abf24d, commit-863bffc, still have problem. Assume you need to dump sched/urgent, and revert commit-863bffc directly from Linus's tree. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On 11/22/2013 08:25 PM, Linus Torvalds wrote: On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. This comes at the smaller cost of doing some atomic operations to keep track of the list's size. Hmm. Why? Afaik, you only care about "empty or not". And if you don't need the serialization from locking, then afaik you can just do a "plist_head_empty()" without holding the lock. NOTE! The "list_empty()" function is very much designed to work even without holding a lock (as long as the head itself exists reliably, of course) BUT you have to then guarantee yourself that your algorithm doesn't have any races wrt other CPU's adding an entry to the list at the same time. Not holding a lock obviously means that you are not serialized against that.. We've had problems with people doing if (!list_empty(waiters)) wake_up_list(..) because they wouldn't wake people up who just got added. But considering that your atomic counter checking has the same lack of serialization, at least the plist_head_empty() check shouldn't be any worse than that counter thing.. And doesn't need any steenking atomic ops or a new counter field. Linus Before the patch, a waker will wake up one or waiters who call spin_lock() before the waker. If we just use plist_head_empty(), we will miss waiters who have called spin_lock(), but have not yet got the lock or inserted itself into the wait list. By adding atomic counter for checking purpose, we again has a single point where any waiters who pass that point (inc atomic counter) can be woken up by a waiter who check the atomic counter after they inc it. This acts sort of like serialization without using a lock. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/locking 0/3] Memory-barrier documentation updates
On Fri, Nov 22, 2013 at 11:26:57AM -0800, Paul E. McKenney wrote: > Hello! > > This series applies some long-needed updates to memory-barriers.txt: > > 1.Add ACCESS_ONCE() calls where needed to ensure their inclusion > in code copy-and-pasted from this file. > > 2.Add long atomic examples alongside the existing atomics. > > 3.Prohibit architectures supporting the Linux kernel from > speculating stores. > > Changes from v2: > > o Update examples so that that load against which the subsequent > store is to be ordered is part of the "if" condition. > > o Add an example showing how the compiler can remove "if" > conditions and how to prevent it from doing so. > > o Add ACCESS_ONCE() to the compiler-barrier section. > > o Add a sentence noting that transitivity requires smp_mb(). > > Changes from v1: > > o Combined with Peter Zijlstra's speculative-store-prohibition patch. > > o Added more pitfalls to avoid when prohibiting speculative > stores, along with how to avoid them. > > o Applied Josh Triplett's review comments. For v3, all three patches: Reviewed-by: Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6] kexec: A new system call, kexec_file_load, for in kernel kexec
Vivek Goyal writes: > On Thu, Nov 21, 2013 at 07:19:07PM +, Matthew Garrett wrote: >> On Thu, Nov 21, 2013 at 02:13:05PM -0500, Vivek Goyal wrote: >> > On Thu, Nov 21, 2013 at 07:06:20PM +, Matthew Garrett wrote: >> > > That would require a certain degree of massaging from userspace if we >> > > want to be able to use the existing Authenticode signatures. Otherwise >> > > we need to sign kernels twice. >> > >> > I was thinking oof signing the same kernel twice. Can I sign authenticode >> > signed kernel again (using RSA signature as we do for modules) and append >> > the signature to bzImage. >> >> No, you'd need to do it the other way around. > > Hmm..., I am running out of ideas here. This is what I understand. > > - If I sign the bzImage (using PKCS1.5 signature), and later it is signed > with authenticode format signatures, then PKCS1.5 signatures will not be > valid as PE/COFF signing will do some modification to PE/COFF header in > bzImage. And another problem is that then I don't have a way to find > PKCS1.5 signature. > > - If bzImage is first signed with authenticode format signature and then > signed using PKCS1.5 signature, then authenticode format signature > will become invalid as it will also hash the data appened at the end > of file. > > So looks like both signatures can't co-exist on same file. That means > one signature has to be detached. > > I am beginning to think that create a kernel option which allows to choose > between attached and detached signatures. Extend kexec syscall to allow > a parameter to pass in detached signatures. If detached signatures are > not passed, then look for signatures at the end of file. That way, those > who are signing kernels using platform specific format (authenticode) in > this case, they can generate detached signature while others can just > use attached signatures. > > Any thoughts on how this should be handled? Inside of a modern bzImage there is an embedded ELF image. How about in userspace we just strip out the embedded ELF image and write that to a file. Then we can use the same signature checking scheme as we do for kernel modules. And you only have to support one file format. As I recall there are already some platforms on x86 like Xen that already need to strip out the embedded ELF image for their loaders to have all of the information they need to load the image. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] kexec: A new system call to allow in kernel loading
Vivek Goyal writes: > On Fri, Nov 22, 2013 at 05:34:03AM -0800, Eric W. Biederman wrote: > > [..] >> > Why ELF case is so interesting. I have not use kexec to boot ELF >> > images in years and have not seen others using it too. In fact bzImage >> > seems to be the most common kernel image format for x86, most of the >> > distros >> > ship and use. >> >> ELF is interesting because it is the minimal file format that does >> everything you need. So especially for a proof of concept ELF needs to >> come first. There is an extra virtual address field in the ELF segment >> header but otherwise ELF does not have any unnecessary fields. >> >> ELF is interesting because it is the native kernel file format on all >> architectures linux supports including x86. >> >> ELF is interesting because producing an ELF image in practice requires >> a trivial amount of tooling so it is a good general purpose format to >> support. > > Ok. I will have a look at ELF loader too. I was hoping to keep only one > loader in initial patch. But looks like that's not acceptable. Thank you. We really need an ELF loader to see that we have handled the general case and to see that we have a solution that is portable to non-x86. > [..] >> >> There is also a huge missing piece of this in that your purgatory is not >> >> checking a hash of the loaded image before jumping too it. Without that >> >> this is a huge regression at least for the kexec on panic case. We >> >> absolutely need to check that the kernel sitting around in memory has >> >> not been corrupted before we let it run very far. >> > >> > Agreed. This should not be hard. It is just a matter of calcualting >> > digest of segments. I will store it in kimge and verify digest again >> > before passing control to control page. Will fix it in next version. >> >> Nak. The verification needs to happen in purgatory. >> >> The verification needs to happen in code whose runtime environment is >> does not depend on random parts of the kernel. Anything else is a >> regression in maintainability and reliability. >> >> It is the wrong direction to add any code to what needs to run in the >> known broken environment of the kernel when a panic happens. >> >> Which means that you almost certainly need to go to the trouble of >> supporting the complexity needed to support purgatory code written in C. >> >> (For those just tuning in purgatory is our term for the code that runs >> between the kernels to do those things that can not happen a priori). > > In general, I agree with not using kernel parts after crash. > > But what protects against that purgatory itself has been scribbled over. > IOW, how different purgatory memory is as compared to kernel memory where > digest routines are stored. They have got equal probably of being scribbled > over and if that's the case one is not better than other? > > And if they both got equal probability to getting corrupted, then there does > not seem to be an advantage in moving digest verification inside > purgatory. The primary reason is that maintenance of code in the kernel that is safe during a crash dump is hard. That is why we boot a second kernel after all. If the code to do the signature verification resides in machine_kexec on the kexec on panic code path in the kernel that has called panic it is almost a given that at some point or other someone will add an option that will add a weird dependency that makes the code unsafe when the kernel is crashing. I have seen it happen several times on the existing kexec on panic code path. I have seen it on other code paths like netconsole. Which can currently on some kernels I have running cause the kernel go go into an endless printk loop if you call printk from interrupt context. So what we really gain by moving the verification into purgatory is protection from inappropriate code reuse. So having a completely separate piece of code may be a little harder to write initially but the code is much simpler and more reliable to maintain. Essentially requiring no maintenance effort. Further getting to the point where purgatory is written in C makes small changes much more approachable. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Doubt on Implict BUGs caused LRO] doubt about lro status in v3.12 of torvalds/linux.git
Hi Eric, Thanks for quick reply. :-) Also, I am confused that in v3.12 why inet_lro.c file is not found in /net/ipv4/ directory, while Makefile has "obj-$(CONFIG_INET_LRO) += inet_lro.o" and LRO heads file exsits. If I am wrong, pls correct me, :-) See the steps 3 for details,(I have repeated it for several times) 3. cd net/ipv4, [backyes@f14 ipv4]$ cat Makefile | grep lro obj-$(CONFIG_INET_LRO) += inet_lro.o [backyes@f14 ipv4]$ ls inet_* inet_connection_sock.c inet_diag.c inet_fragment.c inet_hashtables.c inet_timewait_sock.c --- Best regards Yanfei Wang On Sat, Nov 23, 2013 at 12:32 AM, Eric Dumazet wrote: > On Fri, 2013-11-22 at 23:00 +0800, Yanfei Wang wrote: >> Hi, >> >> Compared to the earlier kernel version, I found that Large receive >> offload(LRO) is obsolete in latest kernel, and that is verified from >> other materials about Generic receive offload(GRO). >> >> However, I also found some LRO header files and lro-related >> functions referred by some drivers, and the lro-related function >> definition is already removed at the same time! >> >> TEST steps: >> 1. my git local db config as follow, url = >> http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> 2. git checkout v3.12 -b 3.12 >> 3. cd net/ipv4, >> [backyes@f14 ipv4]$ cat Makefile | grep lro >> obj-$(CONFIG_INET_LRO) += inet_lro.o >> [backyes@f14 ipv4]$ ls inet_* >> inet_connection_sock.c inet_diag.c inet_fragment.c >> inet_hashtables.c inet_timewait_sock.c >> [backyes@f14 ipv4]$ >> Here, Makefile has inet_lro.o config, while the inet_lro.c is removed. >> 4. lro functions referred by some drivers, as follow, >> [backyes@f14 linux]$ grep "inet_lro" * -r >> drivers/net/ethernet/pasemi/pasemi_mac.c:#include >> drivers/net/ethernet/pasemi/pasemi_mac_ethtool.c:#include >> drivers/infiniband/hw/nes/nes_hw.c:#include >> drivers/infiniband/hw/nes/nes_hw.h:#include >> include/linux/inet_lro.h: * linux/include/linux/inet_lro.h >> net/ipv4/Makefile:obj-$(CONFIG_INET_LRO) += inet_lro.o >> >> so, I understand that the removal of LRO is not clean, which will >> cause compiler error for some kernel config. >> > > I think you are slightly confused. > > LRO is not yet removed. > > Feel free to contribute ;) > > As long as at least one driver still uses LRO, we can not remove LRO. > > Converting a driver means testing the new driver, and owning the > hardware. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote: > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > > In futex_wake() there is clearly no point in taking the hb->lock if > > we know beforehand that there are no tasks to be woken. This comes > > at the smaller cost of doing some atomic operations to keep track of > > the list's size. > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > need the serialization from locking, then afaik you can just do a > "plist_head_empty()" without holding the lock. I remember this being the original approach, but after noticing some strange behavior we quickly decided it wasn't the path. And sure enough, I just double checked and tried the patch without atomic ops and can see things being off: one of the futextest performance cases is stuck blocked on a futex and I couldn't reboot the machine either -- nothing apparent in dmesg, just not 100% functional. The thing is, we can only avoid taking the lock only if nobody else is trying to add itself to the list. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, Nov 22, 2013 at 5:25 PM, Linus Torvalds wrote: > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: >> In futex_wake() there is clearly no point in taking the hb->lock if >> we know beforehand that there are no tasks to be woken. This comes >> at the smaller cost of doing some atomic operations to keep track of >> the list's size. > > Hmm. Why? Afaik, you only care about "empty or not". And if you don't > need the serialization from locking, then afaik you can just do a > "plist_head_empty()" without holding the lock. > > NOTE! > > The "list_empty()" function is very much designed to work even without > holding a lock (as long as the head itself exists reliably, of course) > BUT you have to then guarantee yourself that your algorithm doesn't > have any races wrt other CPU's adding an entry to the list at the same > time. Not holding a lock obviously means that you are not serialized > against that.. We've had problems with people doing > > if (!list_empty(waiters)) > wake_up_list(..) > > because they wouldn't wake people up who just got added. > > But considering that your atomic counter checking has the same lack of > serialization, at least the plist_head_empty() check shouldn't be any > worse than that counter thing.. And doesn't need any steenking atomic > ops or a new counter field. Hi Linus, In this patch, since we do the atomic increments before holding the hb->lock during situations where we may potentially add a task to the list, then we would only skip attempting the wake up if no other thread has already held the hb->lock and is in the process of adding a task to the list, in addition to the list being empty. Would this be enough to protect it from any race condition? Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [f2fs-dev] [PATCH V3 2/2] f2fs: read contiguous sit entry pages by merging for mount performance
Hi, > -Original Message- > From: Jaegeuk Kim [mailto:jaegeuk@samsung.com] > Sent: Friday, November 22, 2013 5:11 PM > To: Chao Yu > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-f2fs-de...@lists.sourceforge.net; 谭姝 > Subject: Re: [f2fs-dev] [PATCH V3 2/2] f2fs: read contiguous sit entry pages > by merging for mount performance > > Hi, > > Just one niptick. > > 2013-11-22 (금), 09:09 +0800, Chao Yu: > > Previously we read sit entries page one by one, this method lost the chance > > of reading contiguous page together. So we read pages as contiguous as > > possible for better mount performance. > > > > change log: > > o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng > >suggested. > > o add mark_page_accessed() before release page to delay VM reclaiming. > > o remove '*order' for simplification of function as Jaegeuk Kim suggested. > > > > Signed-off-by: Chao Yu > > --- > > fs/f2fs/segment.c | 103 > > +++-- > > fs/f2fs/segment.h |2 ++ > > 2 files changed, 78 insertions(+), 27 deletions(-) > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > index 8149eba..998e7d3 100644 > > --- a/fs/f2fs/segment.c > > +++ b/fs/f2fs/segment.c > > @@ -14,6 +14,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "f2fs.h" > > #include "segment.h" > > @@ -1488,41 +1489,89 @@ static int build_curseg(struct f2fs_sb_info *sbi) > > return restore_curseg_summaries(sbi); > > } > > > > +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start, int nrpages) > > +{ > > + struct address_space *mapping = sbi->meta_inode->i_mapping; > > + struct page *page; > > + block_t blk_addr, prev_blk_addr = 0; > > + int sit_blk_cnt = SIT_BLK_CNT(sbi); > > Can we remove SIT_BLK_CNT by replacing with sit_i->sit_blocks? Because sit_i->sit_blocks is a multiple of segment size(i.e. 512/1024), but we may use part of sit_blocks, not all of them. So it's waste to read all of them. What's your opinion? Regards, Yu > > -- > Jaegeuk Kim > Samsung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.13-rc1 is out
James Cloos jhcloos.com> writes: > > This combination: > > # CONFIG_SYSTEM_TRUSTED_KEYRING is not set > CONFIG_TRUSTED_KEYS=m > > (acquired by oldconfig and N to system keyring) > > fails with: > > Pass 2 > CC [M] crypto/asymmetric_keys/x509_rsakey-asn1.o > CC [M] crypto/asymmetric_keys/x509_cert_parser.o > CC [M] crypto/asymmetric_keys/x509_public_key.o > crypto/asymmetric_keys/x509_public_key.c: In function ‘x509_key_preparse’: > crypto/asymmetric_keys/x509_public_key.c:237:35: error: ‘system_trusted_keyring’ > undeclared (first use in this function) >ret = x509_validate_trust(cert, system_trusted_keyring); >^ > crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared identifier is reported > only once for each function it appears in > make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1 > make[1]: *** [crypto/asymmetric_keys] Error 2 > make: *** [crypto] Error 2 > > Perhaps include/keys/system_keyring.h should have a definition for > system_trusted_keyring in the #ifndef CONFIG_SYSTEM_TRUSTED_KEYRING > case (which may entail just removing the #ifdef). > > Commits b56e5a17b6b9acd1 and 09fbc47373826d67 are relevant. > > -JimC I have same problem too. Using following band-aid patch (though I'm not sure it's correct), build is ok; --- a/security/integrity/digsig.c +++ b/security/integrity/digsig.c @@ -67,6 +67,7 @@ int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen, return -EOPNOTSUPP; } +#ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS int integrity_init_keyring(const unsigned int id) { const struct cred *cred = current_cred(); @@ -84,3 +85,4 @@ int integrity_init_keyring(const unsigned int id) keyring_name[id], PTR_ERR(keyring[id])); return 0; } +#endif But, I encounter another build issue; CC crypto/asymmetric_keys/x509_public_key.o crypto/asymmetric_keys/x509_public_key.c: In function ‘x509_key_preparse’: crypto/asymmetric_keys/x509_public_key.c:237:35: error: ‘system_trusted_keyring’ undeclared (first use in this function) ret = x509_validate_trust(cert, system_trusted_keyring); ^ crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1 make[1]: *** [crypto/asymmetric_keys] Error 2 make: *** [crypto] Error 2 Looks like it's caused by following combination... # CONFIG_SYSTEM_TRUSTED_KEYRING is not set CONFIG_X509_CERTIFICATE_PARSER=y Jongman Heo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] mm/zswap: change zswap to writethrough cache
Hello Dan, On Sat, Nov 23, 2013 at 6:10 AM, Dan Streetman wrote: > Currently, zswap is writeback cache; stored pages are not sent > to swap disk, and when zswap wants to evict old pages it must > first write them back to swap cache/disk manually. This avoids > swap out disk I/O up front, but only moves that disk I/O to > the writeback case (for pages that are evicted), and adds the > overhead of having to uncompress the evicted pages, and adds the > need for an additional free page (to store the uncompressed page) > at a time of likely high memory pressure. Additionally, being > writeback adds complexity to zswap by having to perform the > writeback on page eviction. > > This changes zswap to writethrough cache by enabling > frontswap_writethrough() before registering, so that any > successful page store will also be written to swap disk. All the > writeback code is removed since it is no longer needed, and the > only operation during a page eviction is now to remove the entry > from the tree and free it. I agree with Seth, It is not good to embedded device. May be we can find its place in others like server. I guess it is good to medium workload when swap io is not frequent. My suggestion is would you please make it configurable so that user can choice to use writethrough or writeback mode? Regards, > Signed-off-by: Dan Streetman > --- > > I still owe the results of testing pre-patch and post-patch; > I don't think I'll have SPECjbb results today, but I do have a > small test program I can send with results of nozswap, > zswap writeback, and zswap writethrough; I'll send the SPECjbb > results also when I get them. > > Changes since v2: > change the evict function to remove the entry from the tree, > and avoid checking if the entry is in use by load. It is ok > to return success for the eviction even if load is using the entry > as zbud does not use the evict function return value in any significant > way (I'll submit a future patch to remove the return value entirely > from the evict function). > > Changes since v1: > update to apply to latest -tip, previous patch missed several recent > zswap patches. > > > mm/zswap.c | 209 > ++--- > 1 file changed, 19 insertions(+), 190 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index e55bab9..fc35a7a 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -39,7 +39,6 @@ > #include > #include > #include > -#include > #include > > /* > @@ -59,8 +58,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); > > /* Pool limit was hit (see zswap_max_pool_percent) */ > static u64 zswap_pool_limit_hit; > -/* Pages written back when pool limit was reached */ > -static u64 zswap_written_back_pages; > +/* Pages evicted when pool limit was reached */ > +static u64 zswap_evicted_pages; > /* Store failed due to a reclaim failure after pool limit was reached */ > static u64 zswap_reject_reclaim_fail; > /* Compressed page was too big for the allocator to (optimally) store */ > @@ -160,7 +159,7 @@ static void zswap_comp_exit(void) > * rbnode - links the entry into red-black tree for the appropriate swap type > * refcount - the number of outstanding reference to the entry. This is > needed > *to protect against premature freeing of the entry by code > - *concurent calls to load, invalidate, and writeback. The lock > + *concurent calls to load, invalidate, and evict. The lock > *for the zswap_tree structure that contains the entry must > *be held while changing the refcount. Since the lock must > *be held, there is no reason to also make refcount atomic. > @@ -412,132 +411,19 @@ static bool zswap_is_full(void) > } > > /* > -* writeback code > +* evict > **/ > -/* return enum for zswap_get_swap_cache_page */ > -enum zswap_get_swap_ret { > - ZSWAP_SWAPCACHE_NEW, > - ZSWAP_SWAPCACHE_EXIST, > - ZSWAP_SWAPCACHE_FAIL, > -}; > > /* > - * zswap_get_swap_cache_page > - * > - * This is an adaption of read_swap_cache_async() > - * > - * This function tries to find a page with the given swap entry > - * in the swapper_space address space (the swap cache). If the page > - * is found, it is returned in retpage. Otherwise, a page is allocated, > - * added to the swap cache, and returned in retpage. > - * > - * If success, the swap cache page is returned in retpage > - * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache > - * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, > - * the new page is added to swapcache and locked > - * Returns ZSWAP_SWAPCACHE_FAIL on error > + * This is called from zbud to remove an entry that is being evicted. > */ > -static int zswap_get_swap_cache_page(swp_entry_t entry, > - struct page **retpage) > -{ > -
Re: [PATCH] mm/zswap: reverse zswap_entry tree/refcount relationship
Hello Dan, On Sat, Nov 23, 2013 at 6:10 AM, Dan Streetman wrote: > Currently, zswap_entry_put removes the entry from its tree if > the resulting refcount is 0. Several places in code put an > entry's initial reference, but they also must remove the entry > from its tree first, which makes the tree removal in zswap_entry_put > redundant. > > I believe this has the refcount model backwards - the initial > refcount reference shouldn't be managed by multiple different places > in code, and the put function shouldn't be removing the entry > from the tree. I think the correct model is for the tree to be > the owner of the initial entry reference. This way, the only time > any code needs to put the entry is if it's also done a get previously. > The various places in code that remove the entry from the tree simply > do that, and the zswap_rb_erase function does the put of the initial > reference. > > This patch moves the initial referencing completely into the tree > functions - zswap_rb_insert gets the entry, while zswap_rb_erase > puts the entry. The zswap_entry_get/put functions are still available > for any code that needs to use an entry outside of the tree lock. > Also, the zswap_entry_find_get function is renamed to zswap_rb_search_get > since the function behavior and return value is closer to zswap_rb_search > than zswap_entry_get. All code that previously removed the entry from > the tree and put it now only remove the entry from the tree. > > The comment headers for most of the tree insert/search/erase functions > and the get/put functions are updated to clarify if the tree lock > needs to be held as well as when the caller needs to get/put an > entry (i.e. iff the caller is using the entry outside the tree lock). I do not like this patch idea, It breaks the zswap_rb_xxx() purity. I think zswap_rb_xxx() should only focus on rbtree operations. The current code might be redundant, but its logic is clear. So it is not essential need to be changed. If I miss something, please let me know. Regards, > Signed-off-by: Dan Streetman > --- > > This patch requires the writethrough patch to have been applied, but > the patch idea doesn't require the writethrough patch. > > mm/zswap.c | 130 > ++--- > 1 file changed, 72 insertions(+), 58 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index fc35a7a..8c27eb2 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -215,7 +215,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t > gfp) > entry = kmem_cache_alloc(zswap_entry_cache, gfp); > if (!entry) > return NULL; > - entry->refcount = 1; > + entry->refcount = 0; > RB_CLEAR_NODE(>rbnode); > return entry; > } > @@ -228,9 +228,51 @@ static void zswap_entry_cache_free(struct zswap_entry > *entry) > /* > * rbtree functions > **/ > -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t > offset) > + > +/* > + * Carries out the common pattern of freeing and entry's zsmalloc allocation, > + * freeing the entry itself, and decrementing the number of stored pages. > + */ > +static void zswap_free_entry(struct zswap_tree *tree, > + struct zswap_entry *entry) > +{ > + zbud_free(tree->pool, entry->handle); > + zswap_entry_cache_free(entry); > + atomic_dec(_stored_pages); > + zswap_pool_pages = zbud_get_pool_size(tree->pool); > +} > + > +/* caller must hold the tree lock > + * this must be used if the entry will be used outside > + * the tree lock > + */ > +static void zswap_entry_get(struct zswap_entry *entry) > +{ > + entry->refcount++; > +} > + > +/* caller must hold the tree lock > +* remove from the tree and free it, if nobody reference the entry > +*/ > +static void zswap_entry_put(struct zswap_tree *tree, > + struct zswap_entry *entry) > +{ > + int refcount = --entry->refcount; > + > + BUG_ON(refcount < 0); > + if (refcount == 0) > + zswap_free_entry(tree, entry); > +} > + > +/* caller much hold the tree lock > + * This will find the entry for the offset, and return it > + * If no entry is found, NULL is returned > + * If the entry will be used outside the tree lock, > + * then zswap_rb_search_get should be used instead > + */ > +static struct zswap_entry *zswap_rb_search(struct zswap_tree *tree, pgoff_t > offset) > { > - struct rb_node *node = root->rb_node; > + struct rb_node *node = tree->rbroot.rb_node; > struct zswap_entry *entry; > > while (node) { > @@ -246,13 +288,14 @@ static struct zswap_entry *zswap_rb_search(struct > rb_root *root, pgoff_t offset) > } > > /* > + * caller must hold the tree lock > * In the case that a entry with the same offset is found, a pointer to > * the existing entry is stored in dupentry and the function returns -EEXIST
(no subject)
-- Do you need a Loan to start or expand your business ? We give Business Loans upto £100 million if you need funding please send sumary of your Project to our email below: krisparker.loanage...@aim.com Dr. Kris Parker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH resend] slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings
From: Randy Dunlap Fix kernel-doc warning for duplicate definition of 'kmalloc': Documentation/DocBook/kernel-api.xml:9483: element refentry: validity error : ID API-kmalloc already defined Also combine the kernel-doc info from the 2 kmalloc definitions into one block and remove the "see kcalloc" comment since kmalloc now contains the @flags info. Signed-off-by: Randy Dunlap Acked-by: Christoph Lameter --- include/linux/slab.h | 102 ++--- 1 file changed, 46 insertions(+), 56 deletions(-) --- lnx-312-rc1.orig/include/linux/slab.h +++ lnx-312-rc1/include/linux/slab.h @@ -381,10 +381,55 @@ static __always_inline void *kmalloc_lar /** * kmalloc - allocate memory * @size: how many bytes of memory are required. - * @flags: the type of memory to allocate (see kcalloc). + * @flags: the type of memory to allocate. * * kmalloc is the normal method of allocating memory * for objects smaller than page size in the kernel. + * + * The @flags argument may be one of: + * + * %GFP_USER - Allocate memory on behalf of user. May sleep. + * + * %GFP_KERNEL - Allocate normal kernel ram. May sleep. + * + * %GFP_ATOMIC - Allocation will not sleep. May use emergency pools. + * For example, use this inside interrupt handlers. + * + * %GFP_HIGHUSER - Allocate pages from high memory. + * + * %GFP_NOIO - Do not do any I/O at all while trying to get memory. + * + * %GFP_NOFS - Do not make any fs calls while trying to get memory. + * + * %GFP_NOWAIT - Allocation will not sleep. + * + * %GFP_THISNODE - Allocate node-local memory only. + * + * %GFP_DMA - Allocation suitable for DMA. + * Should only be used for kmalloc() caches. Otherwise, use a + * slab created with SLAB_DMA. + * + * Also it is possible to set different flags by OR'ing + * in one or more of the following additional @flags: + * + * %__GFP_COLD - Request cache-cold pages instead of + * trying to return cache-warm pages. + * + * %__GFP_HIGH - This allocation has high priority and may use emergency pools. + * + * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail + * (think twice before using). + * + * %__GFP_NORETRY - If memory is not immediately available, + * then give up at once. + * + * %__GFP_NOWARN - If allocation fails, don't issue any warnings. + * + * %__GFP_REPEAT - If allocation fails initially, try once more before failing. + * + * There are other flags available as well, but these are not intended + * for general use, and so are not documented here. For a full list of + * potential flags, always refer to linux/gfp.h. */ static __always_inline void *kmalloc(size_t size, gfp_t flags) { @@ -495,61 +540,6 @@ int cache_show(struct kmem_cache *s, str void print_slabinfo_header(struct seq_file *m); /** - * kmalloc - allocate memory - * @size: how many bytes of memory are required. - * @flags: the type of memory to allocate. - * - * The @flags argument may be one of: - * - * %GFP_USER - Allocate memory on behalf of user. May sleep. - * - * %GFP_KERNEL - Allocate normal kernel ram. May sleep. - * - * %GFP_ATOMIC - Allocation will not sleep. May use emergency pools. - * For example, use this inside interrupt handlers. - * - * %GFP_HIGHUSER - Allocate pages from high memory. - * - * %GFP_NOIO - Do not do any I/O at all while trying to get memory. - * - * %GFP_NOFS - Do not make any fs calls while trying to get memory. - * - * %GFP_NOWAIT - Allocation will not sleep. - * - * %GFP_THISNODE - Allocate node-local memory only. - * - * %GFP_DMA - Allocation suitable for DMA. - * Should only be used for kmalloc() caches. Otherwise, use a - * slab created with SLAB_DMA. - * - * Also it is possible to set different flags by OR'ing - * in one or more of the following additional @flags: - * - * %__GFP_COLD - Request cache-cold pages instead of - * trying to return cache-warm pages. - * - * %__GFP_HIGH - This allocation has high priority and may use emergency pools. - * - * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail - * (think twice before using). - * - * %__GFP_NORETRY - If memory is not immediately available, - * then give up at once. - * - * %__GFP_NOWARN - If allocation fails, don't issue any warnings. - * - * %__GFP_REPEAT - If allocation fails initially, try once more before failing. - * - * There are other flags available as well, but these are not intended - * for general use, and so are not documented here. For a full list of - * potential flags, always refer to linux/gfp.h. - * - * kmalloc is the normal method of allocating memory - * in the kernel. - */ -static __always_inline void *kmalloc(size_t size, gfp_t flags); - -/** * kmalloc_array - allocate memory for an array. * @n: number of elements. * @size: element size. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] slab: remove the redundant declaration of kmalloc
On 11/21/13 17:07, Qiang Huang wrote: > On 2013/11/22 3:11, Randy Dunlap wrote: >> On 11/20/13 21:57, Qiang Huang wrote: >>> >>> Signed-off-by: Qiang Huang >> >> or use my patch from 2013-09-17: >> http://marc.info/?l=linux-mm=137944291611467=2 >> >> Would be nice to one of these merged... > > Yes, sorry for not notice this, merge your patch should be property :) > But why it's still not be merged? > > Ping... I don't know. I'll resend it now. Thanks. -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
восстановите зрение без трудностей
восстановите зрение без операционным способом http://glazmen.ru/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts
* Tony Lindgren [131122 17:16]: > * Tony Lindgren [131122 17:09]: > > * Russell King - ARM Linux [131122 16:56]: > > > On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote: > > > > + /* See of_device_resource_notify for populating > > > > interrupts */ > > > > + for (i = 0; i < num_irq; i++, res++) { > > > > + res->flags = IORESOURCE_IRQ; > > > > + res->start = -EPROBE_DEFER; > > > > + res->end = -EPROBE_DEFER; > > > > > > NAK. Definitely a bad idea to start introducing magic values other into > > > resources. Please don't do this. > > > > Do you have any better ideas on how to sort out this issue then? > > I guess we could allocate all the resources lazily here, I'll take a look > at that. Here's a version that allocates the resources lazily with the notifier. Seems to boot, need to play with it a bit more though to make sure we're not overwriting resources for any legacy devices. Regards, Tony --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -141,13 +141,47 @@ struct platform_device *of_device_alloc(struct device_node *np, struct device *parent) { struct platform_device *dev; - int rc, i, num_reg = 0, num_irq; - struct resource *res, temp_res; dev = platform_device_alloc("", -1); if (!dev) return NULL; + dev->dev.of_node = of_node_get(np); +#if defined(CONFIG_MICROBLAZE) + dev->dev.dma_mask = >archdata.dma_mask; +#endif + dev->dev.parent = parent; + + if (bus_id) + dev_set_name(>dev, "%s", bus_id); + else + of_device_make_bus_id(>dev); + + /* See of_device_resource_notify for populating the resources */ + + return dev; +} +EXPORT_SYMBOL(of_device_alloc); + +/* + * The device interrupts are not necessarily available for all + * irqdomains initially so we need to populate them using a + * notifier. + */ +static int of_device_resource_notify(struct notifier_block *nb, +unsigned long event, void *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct device_node *np = pdev->dev.of_node; + int rc, i, num_reg = 0, num_irq; + struct resource *res, temp_res; + + if (event != BUS_NOTIFY_BIND_DRIVER) + return 0; + + if (!np) + goto out; + /* count the io and irq resources */ if (of_can_translate_address(np)) while (of_address_to_resource(np, num_reg, _res) == 0) @@ -158,12 +192,12 @@ struct platform_device *of_device_alloc(struct device_node *np, if (num_irq || num_reg) { res = kzalloc(sizeof(*res) * (num_irq + num_reg), GFP_KERNEL); if (!res) { - platform_device_put(dev); - return NULL; + platform_device_put(pdev); + goto out; } - dev->num_resources = num_reg + num_irq; - dev->resource = res; + pdev->num_resources = num_reg + num_irq; + pdev->resource = res; for (i = 0; i < num_reg; i++, res++) { rc = of_address_to_resource(np, i, res); WARN_ON(rc); @@ -171,20 +205,9 @@ struct platform_device *of_device_alloc(struct device_node *np, WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq); } - dev->dev.of_node = of_node_get(np); -#if defined(CONFIG_MICROBLAZE) - dev->dev.dma_mask = >archdata.dma_mask; -#endif - dev->dev.parent = parent; - - if (bus_id) - dev_set_name(>dev, "%s", bus_id); - else - of_device_make_bus_id(>dev); - - return dev; +out: + return NOTIFY_DONE; } -EXPORT_SYMBOL(of_device_alloc); /** * of_platform_device_create_pdata - Alloc, initialize and register an of_device @@ -447,6 +470,8 @@ int of_platform_bus_probe(struct device_node *root, } EXPORT_SYMBOL(of_platform_bus_probe); +static struct notifier_block resource_nb; + /** * of_platform_populate() - Populate platform_devices from device tree data * @root: parent of the first level to probe or NULL for the root of the tree @@ -478,6 +503,11 @@ int of_platform_populate(struct device_node *root, if (!root) return -EINVAL; + if (!resource_nb.notifier_call) { + resource_nb.notifier_call = of_device_resource_notify, + bus_register_notifier(_bus_type, _nb); + } + for_each_child_of_node(root, child) { rc = of_platform_bus_create(child, matches, lookup, parent, true); if (rc) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: gem csum breakage with 018c5bba (Handle CHECKSUM_COMPLETE more adequately ...)
On Fri, Nov 22, 2013 at 07:08:16PM +0100, Alexander Graf wrote: > Hi, > > With current Linus' master tree my ibook started emitting a lot of annoying > csum calculation warnings. I've bisected it down to the commit seen in the > subject line, but I suppose the real problem lies somewhere a lot deeper and > only gets revealed thanks to the more clever checksum logic applied now. > > Any ideas how to dig further into this? Sorry, I don't have any docs on this device. Perhaps Dave has the docs which can tell us what RXDCTRL_TCPCSUM actually returns? I assume it returns the correct checksum ending at the IP packet boundary and only goes wrong when extra padding exists. Hmm, I suppose the question then is what exactly does it do if the packet isn't an IP packet? Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Crypto Update for 3.13
Hi Linus: This is a resend with the recipients in the cc field instead of the to field. Here is the crypto update for 3.13: * Made x86 ablk_helper generic for ARM. * Phase out chainiv in favour of eseqiv (affects IPsec). * Fixed aes-cbc IV corruption on s390. * Added constant-time crypto_memneq which replaces memcmp. * Fixed aes-ctr in omap-aes. * Added OMAP3 ROM RNG support. * Add PRNG support for MSM SoC's * Add and use Job Ring API in caam. * Misc fixes. Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git Alex Porosanu (7): crypto: caam - fix RNG state handle instantiation descriptor crypto: caam - fix hash, alg and rng registration if CAAM driver not initialized crypto: caam - fix RNG4 instantiation crypto: caam - split RNG4 instantiation function crypto: caam - uninstantiate RNG state handle 0 if instantiated by caam driver crypto: caam - fix RNG4 AAI defines crypto: caam - enable instantiation of all RNG4 state handles Ard Biesheuvel (2): crypto: create generic version of ablk_helper crypto: move x86 to the generic version of ablk_helper Ben Hutchings (1): hwrng: via-rng - Mark device ID table as __maybe_unused Fabio Estevam (4): crypto: dcp - Use devm_ioremap_resource() crypto: dcp - Use devm_request_irq() crypto: dcp - Fix the path for releasing the resources crypto: dcp - Check the return value from devm_ioremap_resource() Herbert Xu (2): crypto: skcipher - Use eseqiv even on UP machines crypto: s390 - Fix aes-cbc IV corruption James Yonan (1): crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks Joel Fernandes (1): crypto: omap-aes - Fix CTR mode counter length Joni Lapilainen (1): crypto: omap-sham - Add missing modalias Jussi Kivilinna (2): crypto: sha256_ssse3 - use correct module alias for sha224 crypto: x86 - restore avx2_supported check Linus Walleij (1): crypto: tegra - use kernel entropy instead of ad-hoc Mathias Krause (6): crypto: authenc - Export key parsing helper function crypto: authencesn - Simplify key parsing crypto: ixp4xx - Simplify and harden key parsing crypto: picoxcell - Simplify and harden key parsing crypto: talitos - Simplify key parsing padata: make the sequence counter an atomic_t Michael Ellerman (2): hwrng: pseries - Use KBUILD_MODNAME in pseries-rng.c hwrng: pseries - Return errors to upper levels in pseries-rng.c Michael Opdenacker (1): crypto: mv_cesa: remove deprecated IRQF_DISABLED Neil Horman (1): crypto: ansi_cprng - Fix off by one error in non-block size request Oliver Neukum (1): crypto: sha256_ssse3 - also test for BMI2 Pali Rohár (1): hwrng: OMAP3 ROM Random Number Generator support Ruchika Gupta (3): crypto: caam - Add Platform driver for Job Ring crypto: caam - Add API's to allocate/free Job Rings crypto: caam - Modify the interface layers to use JR API's Sachin Kamat (7): crypto: mv_cesa - Staticize local symbols crypto: omap-aes - Staticize local symbols crypto: tegra-aes - Staticize tegra_aes_cra_exit crypto: tegra-aes - Fix NULL pointer dereference crypto: tegra-aes - Use devm_clk_get crypto: sahara - Remove redundant of_match_ptr crypto: mv_cesa - Remove redundant of_match_ptr Stanimir Varbanov (2): ARM: DT: msm: Add Qualcomm's PRNG driver binding document hwrng: msm - Add PRNG support for MSM SoC's Stephen Warren (1): ARM: tegra: remove tegra_chip_uid() Yashpal Dutta (1): crypto: caam - map src buffer before access kbuild test robot (1): crypto: ablk_helper - Replace memcpy with struct assignment .../devicetree/bindings/rng/qcom,prng.txt | 17 + arch/arm/mach-tegra/fuse.c | 10 - arch/s390/crypto/aes_s390.c| 19 +- arch/x86/crypto/Makefile |3 +- arch/x86/crypto/aesni-intel_glue.c |2 +- arch/x86/crypto/camellia_aesni_avx2_glue.c |2 +- arch/x86/crypto/camellia_aesni_avx_glue.c |2 +- arch/x86/crypto/cast5_avx_glue.c |2 +- arch/x86/crypto/cast6_avx_glue.c |2 +- arch/x86/crypto/serpent_avx2_glue.c|2 +- arch/x86/crypto/serpent_avx_glue.c |2 +- arch/x86/crypto/serpent_sse2_glue.c|2 +- arch/x86/crypto/sha256_ssse3_glue.c|4 +- arch/x86/crypto/twofish_avx_glue.c |2 +- arch/x86/include/asm/simd.h| 11 + crypto/Kconfig | 23 +- crypto/Makefile|8 +- {arch/x86/crypto => crypto}/ablk_helper.c | 13 +- crypto/ablkcipher.c| 21 +-
Crypto Update for 3.13
Hi Linus: This is a resend. Here is the crypto update for 3.13: * Made x86 ablk_helper generic for ARM. * Phase out chainiv in favour of eseqiv (affects IPsec). * Fixed aes-cbc IV corruption on s390. * Added constant-time crypto_memneq which replaces memcmp. * Fixed aes-ctr in omap-aes. * Added OMAP3 ROM RNG support. * Add PRNG support for MSM SoC's * Add and use Job Ring API in caam. * Misc fixes. Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git Alex Porosanu (7): crypto: caam - fix RNG state handle instantiation descriptor crypto: caam - fix hash, alg and rng registration if CAAM driver not initialized crypto: caam - fix RNG4 instantiation crypto: caam - split RNG4 instantiation function crypto: caam - uninstantiate RNG state handle 0 if instantiated by caam driver crypto: caam - fix RNG4 AAI defines crypto: caam - enable instantiation of all RNG4 state handles Ard Biesheuvel (2): crypto: create generic version of ablk_helper crypto: move x86 to the generic version of ablk_helper Ben Hutchings (1): hwrng: via-rng - Mark device ID table as __maybe_unused Fabio Estevam (4): crypto: dcp - Use devm_ioremap_resource() crypto: dcp - Use devm_request_irq() crypto: dcp - Fix the path for releasing the resources crypto: dcp - Check the return value from devm_ioremap_resource() Herbert Xu (2): crypto: skcipher - Use eseqiv even on UP machines crypto: s390 - Fix aes-cbc IV corruption James Yonan (1): crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks Joel Fernandes (1): crypto: omap-aes - Fix CTR mode counter length Joni Lapilainen (1): crypto: omap-sham - Add missing modalias Jussi Kivilinna (2): crypto: sha256_ssse3 - use correct module alias for sha224 crypto: x86 - restore avx2_supported check Linus Walleij (1): crypto: tegra - use kernel entropy instead of ad-hoc Mathias Krause (6): crypto: authenc - Export key parsing helper function crypto: authencesn - Simplify key parsing crypto: ixp4xx - Simplify and harden key parsing crypto: picoxcell - Simplify and harden key parsing crypto: talitos - Simplify key parsing padata: make the sequence counter an atomic_t Michael Ellerman (2): hwrng: pseries - Use KBUILD_MODNAME in pseries-rng.c hwrng: pseries - Return errors to upper levels in pseries-rng.c Michael Opdenacker (1): crypto: mv_cesa: remove deprecated IRQF_DISABLED Neil Horman (1): crypto: ansi_cprng - Fix off by one error in non-block size request Oliver Neukum (1): crypto: sha256_ssse3 - also test for BMI2 Pali Rohár (1): hwrng: OMAP3 ROM Random Number Generator support Ruchika Gupta (3): crypto: caam - Add Platform driver for Job Ring crypto: caam - Add API's to allocate/free Job Rings crypto: caam - Modify the interface layers to use JR API's Sachin Kamat (7): crypto: mv_cesa - Staticize local symbols crypto: omap-aes - Staticize local symbols crypto: tegra-aes - Staticize tegra_aes_cra_exit crypto: tegra-aes - Fix NULL pointer dereference crypto: tegra-aes - Use devm_clk_get crypto: sahara - Remove redundant of_match_ptr crypto: mv_cesa - Remove redundant of_match_ptr Stanimir Varbanov (2): ARM: DT: msm: Add Qualcomm's PRNG driver binding document hwrng: msm - Add PRNG support for MSM SoC's Stephen Warren (1): ARM: tegra: remove tegra_chip_uid() Yashpal Dutta (1): crypto: caam - map src buffer before access kbuild test robot (1): crypto: ablk_helper - Replace memcpy with struct assignment .../devicetree/bindings/rng/qcom,prng.txt | 17 + arch/arm/mach-tegra/fuse.c | 10 - arch/s390/crypto/aes_s390.c| 19 +- arch/x86/crypto/Makefile |3 +- arch/x86/crypto/aesni-intel_glue.c |2 +- arch/x86/crypto/camellia_aesni_avx2_glue.c |2 +- arch/x86/crypto/camellia_aesni_avx_glue.c |2 +- arch/x86/crypto/cast5_avx_glue.c |2 +- arch/x86/crypto/cast6_avx_glue.c |2 +- arch/x86/crypto/serpent_avx2_glue.c|2 +- arch/x86/crypto/serpent_avx_glue.c |2 +- arch/x86/crypto/serpent_sse2_glue.c|2 +- arch/x86/crypto/sha256_ssse3_glue.c|4 +- arch/x86/crypto/twofish_avx_glue.c |2 +- arch/x86/include/asm/simd.h| 11 + crypto/Kconfig | 23 +- crypto/Makefile|8 +- {arch/x86/crypto => crypto}/ablk_helper.c | 13 +- crypto/ablkcipher.c| 21 +- crypto/ansi_cprng.c|4 +-
[GIT PULL] NTB bug fixes and features for v3.13
Hi Linus, Below are a number of NTB bug fixes and features for v3.13. Please consider pulling them. Thanks, Jon The following changes since commit 5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52: Linux 3.12 (2013-11-03 15:41:51 -0800) are available in the git repository at: git://github.com/jonmason/ntb.git tags/ntb-3.13 for you to fetch changes up to e8aeb60c389c2aa48d345bcf717f8cb7edf67680: NTB: Disable interrupts and poll under high load (2013-11-20 09:57:34 -0700) NTB driver bug fixes to address a missed call to pci_enable_msix, NTB-RP Link Up issue, Xeon Doorbell errata workaround, ntb_transport link down race, and correct dmaengine_get/put usage. Also, clean-ups to remove duplicate defines and document a hardware errata. Finally, some changes to improve performance. Alexander Gordeev (1): ntb: Fix missed call to pci_enable_msix() Jon Mason (7): NTB: Xeon Doorbell errata workaround NTB: Fix NTB-RP Link Up NTB: Fix ntb_transport link down race NTB: correct dmaengine_get/put usage NTB: Document HW errata NTB: Enable Snoop on Primary Side NTB: Disable interrupts and poll under high load Michael Opdenacker (1): NTB: remove duplicate defines drivers/ntb/ntb_hw.c| 121 --- drivers/ntb/ntb_hw.h|7 +-- drivers/ntb/ntb_regs.h | 16 +++--- drivers/ntb/ntb_transport.c | 77 ++- 4 files changed, 156 insertions(+), 65 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/7] clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
On Tue, Nov 12, 2013 at 08:01:37PM +0100, Daniel Lezcano wrote: > On 11/08/2013 10:21 PM, Soren Brinkmann wrote: > >The currently used method adjusting the clocksource to a changing input > >frequency does not work on kernels from 3.11 on. > >The new approach is to keep the timer frequency as constant as possible. > >I.e. > > - due to the TTC's prescaler limitations, allow frequency changes > >only if the frequency scales by a power of 2 > > - adjust the counter's divider on the fly when a frequency change > >occurs > > > >When suspending though, the driver should not prevent rate changes in > >order to allow the system to enter its low power state. For that > >reason a PM notifier is added so rate changes can be ignored during > >suspend/resume. > > It sounds very weird you have to add a PM notifier in this driver. > > Have you been facing an issue or do you assume it could happen ? As I said in my other email, we do have some issues related with suspend. The construct with the PM notifier seemed to work but since we upgraded our vendor tree to 3.12 I see issues with this code. So, for now, I'll remove the PM notifier and associated code from this patch. I have to revisit the suspend case later. Thanks, Sören -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso wrote: > In futex_wake() there is clearly no point in taking the hb->lock if > we know beforehand that there are no tasks to be woken. This comes > at the smaller cost of doing some atomic operations to keep track of > the list's size. Hmm. Why? Afaik, you only care about "empty or not". And if you don't need the serialization from locking, then afaik you can just do a "plist_head_empty()" without holding the lock. NOTE! The "list_empty()" function is very much designed to work even without holding a lock (as long as the head itself exists reliably, of course) BUT you have to then guarantee yourself that your algorithm doesn't have any races wrt other CPU's adding an entry to the list at the same time. Not holding a lock obviously means that you are not serialized against that.. We've had problems with people doing if (!list_empty(waiters)) wake_up_list(..) because they wouldn't wake people up who just got added. But considering that your atomic counter checking has the same lack of serialization, at least the plist_head_empty() check shouldn't be any worse than that counter thing.. And doesn't need any steenking atomic ops or a new counter field. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/8] ARM: OMAP2+: timer: Introduce OF-friendly clocksource/clockevent system timers
Sorry, resending due to an MUA issue. Adding Thomas to the thread since discussion is about clocksource, and Mark Rutland as discussion is related to timers and DT, thanks. On 11/22/2013 02:01 PM, Rob Herring wrote: > On Fri, Nov 22, 2013 at 10:42 AM, Joel Fernandes wrote: >> On 11/22/2013 09:58 AM, Rob Herring wrote: >>> On Thu, Nov 21, 2013 at 7:56 PM, Joel Fernandes wrote: This work is a migration effort of OMAP system timers to the clocksource/clockevent framework. Consider this as a first-pass in this effort. There are few cleanups that need to be done first. The HWMOD code is intertwined with the timer code. HWMOD code cleanups in the future will hopefully make most of this code go away, so till then we separate out the power/clocks portion of the code from the actual timer bits. This will facilitate near-future work of adapting the system timer as a clocksource. New functions for OF-only boot are introduced, and we can soon delete the old versions once we migrate all platforms. Currently only AM335x is migrated and testedA new omap_generic_timer_init function is introduced for DT platforms. Code required earlier for non-DT platforms such as setup of timer IDs and timer parent clock is not required. parent clocks are automatically setup by the mux clock driver through DT so they no longer need to be hardcoded. The init code will try to pick the best timer for clocksource and clockevent however bindings are added to force a particular timer as clocksource or clockevent through DT. Signed-off-by: Joel Fernandes --- .../devicetree/bindings/arm/omap/timer.txt | 12 ++ arch/arm/mach-omap2/common.h | 1 + arch/arm/mach-omap2/timer.c| 235 + 3 files changed, 248 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/omap/timer.txt b/Documentation/devicetree/bindings/arm/omap/timer.txt index d02e27c..6cf7a75 100644 --- a/Documentation/devicetree/bindings/arm/omap/timer.txt +++ b/Documentation/devicetree/bindings/arm/omap/timer.txt @@ -32,6 +32,18 @@ Optional properties: - ti,timer-secure: Indicates the timer is reserved on a secure OMAP device and therefore cannot be used by the kernel. +- ti,timer-clockevent, + ti,timer-clocksource These properties force the system timer code to choose + the particular timer as a clockevent or clocksource. + If these properties are not specified, the timer code + picks up a "ti,timer-alwon" as the clocksource and a + timer containing one of the following properties as + the clockevent in the following order: + ti,timer-alwon + ti,timer-dsp + ti,timer-pwm + ti,timer-secure >>> >>> These properties were added specifically for the reason of avoiding >>> linux specific properties like these. When is this not sufficient? >> >> Some platforms cannot use certain timers as clockevents and clocksource, to >> keep >> this code functionally equivalent and working, I added these properties so >> that >> its possible to select specific timers as clockevents/sources as was being >> done >> in the non-DT case. I'm open to suggestions for doing this in a better way >> for DT. > > There has to be a defined reason why a given timer cannot be used. You > are not explaining what that reason is. Define a property or set of > properties that describe the h/w feature or quirk. Reason? There are HW bugs that prevent a timer from being used. See [1]. And there may be other reasons, I haven't written this code but I know that there are other HW bugs that made authors pick a specific timer such as board issues or accuracy. This is nothing new, just that I'm trying to find a way to do it from DT. You can see the ifdef'ry below in mach-omap2/timer.c. All I'm trying to do is make it simpler and cleaner by adding these properties to DT so that we can delete this code entirely. #ifdef CONFIG_ARCH_OMAP2 OMAP_SYS_32K_TIMER_INIT(2, 1, "timer_32k_ck", "ti,timer-alwon", 2, "timer_sys_ck", NULL); #endif /* CONFIG_ARCH_OMAP2 */ #if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM43XX) OMAP_SYS_32K_TIMER_INIT(3, 1, "timer_32k_ck", "ti,timer-alwon", 2, "timer_sys_ck", NULL); OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure", 2, "timer_sys_ck", NULL); #endif /* CONFIG_ARCH_OMAP3 */ etc... >>> >>> And I agree with the comment to use OF_CLKSRC. >>> >> >>
Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation
On Fri, 22 Nov 2013, Scott Wood wrote: > This sounds like an incompatible change to userspace API. What about > older glibc? What about user code that directly manipulates these bits > rather than going through libc, or uses a libc other than glibc? Where > is this API requirement documented? The previous EGLIBC port, and the uClibc code copied from it, is fundamentally broken as regards any use of prctl for floating-point exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl calls (and did various worse things, such as passing a pointer when prctl expected an integer). If you avoid anything where prctl is used, the clearing of sticky bits still means it will never give anything approximating correct exception semantics with existing kernels. I don't believe the patch makes things any worse for existing code that doesn't try to inform the kernel of changes to sticky bits - such code may get incorrect exceptions in some cases, but it would have done so anyway in other cases. This is the best API I could come up with to fix the fundamentally broken nature of what came before, taking into account that in many cases a prctl call is already needed along with userspace manipulation of exception bits. I'm not aware of any kernel documentation where this sort of subarchitecture-specific API detail is documented. (The API also includes such things as needing to leave the spefscr trap-enable bits set and use prctl to control whether SIGFPE results from exceptions.) > I think the impact of this could be reduced by using this mechanism only > to clear bits, rather than set them. That is, if the exception bit is > unset, don't set it just because it's set in spefscr_last -- but if it's > not set in spefscr_last, and the emulation code doesn't want to set it, > then clear it. It should already be the case in this patch that if a bit is clear in spefscr, and set in spefscr_last (i.e. userspace did not inform the kernel of clearing the bit, and no traps since then have resulted in the kernel noticing it was cleared), it won't get set unless the emulation code wants to set it. The sole place spefscr_last is read is in the statement "__FPU_FPSCR &= ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current->thread.spefscr_last;" - if the bit is already clear in spefscr, this statement has no effect on it. > Are there any cases where the exception bit can be set without the > kernel taking a trap, or is userspace manipulation limited to clearing > the bits? Userspace can both set and clear the bits without a trap. For example, fesetenv restores a saved value of spefscr which may both set and clear bits (and then it calls prctl because it needs to do so anyway to restore the saved state for which exceptions were enabled). fesetexceptflag restores saved state of particular exceptions without a trap (so needs to call prctl specially to inform the kernel of a change). -- Joseph S. Myers jos...@codesourcery.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] iproute2 3.12.0 release
A little late but ready and toasty warm here is iproute2 to go with 3.12.0 (aka One Giant Leap for Frogkind). In addition to the usual build documentation fixes, this version includes support for ipv6 on vxlan and GRE. As well as fair queue packet scheduler. If you have been sitting on changes to iproute2 that are in net-next for 3.12 merge window, please submit them now. Iproute2 package is available at: http://kernel.org/pub/linux/utils/net/iproute2/iproute2-3.12.0.tar.gz You can download the source from: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git Stay Warm! --- Andreas Henriksson (1): ss: avoid passing negative numbers to malloc Christophe Gouault (1): xfrm: enable to set non-wildcard mark 0 on SAs and SPs Eric Dumazet (3): pkt_sched: fq: Fair Queue packet scheduler tc: support TCA_STATS_RATE_EST64 htb: add support for direct_qlen attribute Fan Du (1): xfrm: use memcpy to suppress gcc phony buffer overflow warning. Hangbin Liu (1): ipaddrlabel: use uint32_t instead of int32_t Jamal Hadi Salim (2): tc: introduce simple action action: typo nat fix Nicolas Dichtel (1): iplink: update available type list Nigel Kukard (1): Fix tc stats when using -batch mode Petr Písař (2): iproute2: bridge: document mdb iproute2: bridge: Close file with bridge monitor file Sami Kerola (1): ip: make -resolve addr to print names rather than addresses Stefan Tomanek (1): ip rule: add route suppression options Stephen Hemminger (14): Update kernel headers to net-next for 3.12 Update to 3.11 net-next kernel headers nstat: add json output format Update to 3.12-rc1 headers nstat: revise json output ifstat: add json output format lnstat: add json output format lnstat, nstat, ifstat: update man pages tc: add default action to kernel headers ipv6 gre: add entry to ether types Fix handling of qdis without options htb: remove old unused duplicate qdisc name update kernel headers v3.12.0 WANG Cong (1): vxlan: add ipv6 support x...@mail.ru (2): iproute2: GRE over IPv6 tunnel support. iproute2: ip6gre: update man pages -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts
* Tony Lindgren [131122 17:09]: > * Russell King - ARM Linux [131122 16:56]: > > On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote: > > > + /* See of_device_resource_notify for populating interrupts */ > > > + for (i = 0; i < num_irq; i++, res++) { > > > + res->flags = IORESOURCE_IRQ; > > > + res->start = -EPROBE_DEFER; > > > + res->end = -EPROBE_DEFER; > > > > NAK. Definitely a bad idea to start introducing magic values other into > > resources. Please don't do this. > > Do you have any better ideas on how to sort out this issue then? I guess we could allocate all the resources lazily here, I'll take a look at that. Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] double free in decompressor.c
2013/11/22 Phillip Lougher : > On 22/11/13 21:50, Geyslan Gregório Bem wrote: >> >> Coverity caught double free possibility (CID 1130962). >> >> I can patch this, but I have to know if is correct to free comp_opts >> in the function squashfs_decompressor_create() or it had to be done in >> the caller. My bet is the caller. >> >> >> 128void *squashfs_decompressor_setup(struct super_block *sb, unsigned >> short flags) >> 129{ >> 130struct squashfs_sb_info *msblk = sb->s_fs_info; >> 131void *stream, *comp_opts = get_comp_opts(sb, flags); >> 132 >> >> 1. Condition "IS_ERR(comp_opts)", taking false branch >> 133if (IS_ERR(comp_opts)) >> 134return comp_opts; >> 135 >> >> 2. freed_arg: "squashfs_decompressor_create(struct squashfs_sb_info *, >> void *)" frees "comp_opts".[show details] >> 136stream = squashfs_decompressor_create(msblk, comp_opts); >> >> 3. Condition "IS_ERR(stream)", taking true branch >> 137if (IS_ERR(stream)) > > > FALSE positive. > > squashfs_decompressor_create() frees comp_opts only on success. > > If IS_ERR(stream) is true, then comp_opts has not been freed by > squashfs_decompressor_create(). > > Phillip > > > >> >> CID 1130962 (#1 of 1): Double free (USE_AFTER_FREE)4. double_free: >> Calling "kfree(void const *)" frees pointer "comp_opts" which has >> already been freed. >> 138kfree(comp_opts); >> 139 >> 140return stream; >> 141} >> >> > Philip, set as false positive in Coverity. Thanks. -- Regards, Geyslan G. Bem hackingbits.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/8] ARM: OMAP2+: timer: Introduce OF-friendly clocksource/clockevent system timers
Adding Thomas to the thread since discussion is about clocksource, and Mark Rutland as discussion is related to timers and DT, thanks. On 11/22/2013 02:01 PM, Rob Herring wrote: > On Fri, Nov 22, 2013 at 10:42 AM, Joel Fernandes wrote: >> On 11/22/2013 09:58 AM, Rob Herring wrote: >>> On Thu, Nov 21, 2013 at 7:56 PM, Joel Fernandes wrote: This work is a migration effort of OMAP system timers to the clocksource/clockevent framework. Consider this as a first-pass in this effort. There are few cleanups that need to be done first. The HWMOD code is intertwined with the timer code. HWMOD code cleanups in the future will hopefully make most of this code go away, so till then we separate out the power/clocks portion of the code from the actual timer bits. This will facilitate near-future work of adapting the system timer as a clocksource. New functions for OF-only boot are introduced, and we can soon delete the old versions once we migrate all platforms. Currently only AM335x is migrated and testedA new omap_generic_timer_init function is introduced for DT platforms. Code required earlier for non-DT platforms such as setup of timer IDs and timer parent clock is not required. parent clocks are automatically setup by the mux clock driver through DT so they no longer need to be hardcoded. The init code will try to pick the best timer for clocksource and clockevent however bindings are added to force a particular timer as clocksource or clockevent through DT. Signed-off-by: Joel Fernandes --- .../devicetree/bindings/arm/omap/timer.txt | 12 ++ arch/arm/mach-omap2/common.h | 1 + arch/arm/mach-omap2/timer.c| 235 + 3 files changed, 248 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/omap/timer.txt b/Documentation/devicetree/bindings/arm/omap/timer.txt index d02e27c..6cf7a75 100644 --- a/Documentation/devicetree/bindings/arm/omap/timer.txt +++ b/Documentation/devicetree/bindings/arm/omap/timer.txt @@ -32,6 +32,18 @@ Optional properties: - ti,timer-secure: Indicates the timer is reserved on a secure OMAP device and therefore cannot be used by the kernel. +- ti,timer-clockevent, + ti,timer-clocksource These properties force the system timer code to choose + the particular timer as a clockevent or clocksource. + If these properties are not specified, the timer code + picks up a "ti,timer-alwon" as the clocksource and a + timer containing one of the following properties as + the clockevent in the following order: + ti,timer-alwon + ti,timer-dsp + ti,timer-pwm + ti,timer-secure >>> >>> These properties were added specifically for the reason of avoiding >>> linux specific properties like these. When is this not sufficient? >> >> Some platforms cannot use certain timers as clockevents and clocksource, to >> keep >> this code functionally equivalent and working, I added these properties so >> that >> its possible to select specific timers as clockevents/sources as was being >> done >> in the non-DT case. I'm open to suggestions for doing this in a better way >> for DT. > > There has to be a defined reason why a given timer cannot be used. You > are not explaining what that reason is. Define a property or set of > properties that describe the h/w feature or quirk. Reason? There are HW bugs that prevent a timer from being used. See [1]. And there may be other reasons, I haven't written this code but I know that there are other HW bugs that made authors pick a specific timer such as board issues or accuracy. This is nothing new, just that I'm trying to find a way to do it from DT. You can see the ifdef'ry below in mach-omap2/timer.c. All I'm trying to do is make it simpler and cleaner by adding these properties to DT so that we can delete this code entirely. #ifdef CONFIG_ARCH_OMAP2 OMAP_SYS_32K_TIMER_INIT(2, 1, "timer_32k_ck", "ti,timer-alwon", 2, "timer_sys_ck", NULL); #endif /* CONFIG_ARCH_OMAP2 */ #if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM43XX) OMAP_SYS_32K_TIMER_INIT(3, 1, "timer_32k_ck", "ti,timer-alwon", 2, "timer_sys_ck", NULL); OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure", 2, "timer_sys_ck", NULL); #endif /* CONFIG_ARCH_OMAP3 */ etc... >>> >>> And I agree with the comment to use OF_CLKSRC. >>> >> >> There are difficulties I mentioned in a
Re: Linux 3.13-rc1 is out
This combination: # CONFIG_SYSTEM_TRUSTED_KEYRING is not set CONFIG_TRUSTED_KEYS=m (acquired by oldconfig and N to system keyring) fails with: Pass 2 CC [M] crypto/asymmetric_keys/x509_rsakey-asn1.o CC [M] crypto/asymmetric_keys/x509_cert_parser.o CC [M] crypto/asymmetric_keys/x509_public_key.o crypto/asymmetric_keys/x509_public_key.c: In function ‘x509_key_preparse’: crypto/asymmetric_keys/x509_public_key.c:237:35: error: ‘system_trusted_keyring’ undeclared (first use in this function) ret = x509_validate_trust(cert, system_trusted_keyring); ^ crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1 make[1]: *** [crypto/asymmetric_keys] Error 2 make: *** [crypto] Error 2 Perhaps include/keys/system_keyring.h should have a definition for system_trusted_keyring in the #ifndef CONFIG_SYSTEM_TRUSTED_KEYRING case (which may entail just removing the #ifdef). Commits b56e5a17b6b9acd1 and 09fbc47373826d67 are relevant. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts
* Russell King - ARM Linux [131122 16:56]: > On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote: > > + /* See of_device_resource_notify for populating interrupts */ > > + for (i = 0; i < num_irq; i++, res++) { > > + res->flags = IORESOURCE_IRQ; > > + res->start = -EPROBE_DEFER; > > + res->end = -EPROBE_DEFER; > > NAK. Definitely a bad idea to start introducing magic values other into > resources. Please don't do this. Do you have any better ideas on how to sort out this issue then? Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.13-rc1 is out
On Fri, Nov 22, 2013 at 4:43 PM, Matthew Garrett wrote: > > I'd sent a pull yesterday (http://lkml.org/lkml/2013/11/21/599 ) which > may have missed your cutoff, but it seems worth checking. No, it didn't miss my cutoff, and it wasn't even eaten by the gmail spam filters, but it *had* missed my normal pattern for checking that I had merged all git pull requests, which is why I had overlooked it. Pulled now (going through the allmodconfig build test before being pushed out shortly). Btw, the reason it missed my normal check is that I search for "in:inbox git pull" as a sanity check that I don't have anything pending. Now, *most* of the time I tend to catch things regardless just by virtue of reading email carefully, but when I'm busy it definitely helps if your email matches that pattern. The easiest (and most common) way to do that is to have the subject line be something like [GIT PULL] sound fixes #2 for 3.13-rc1 but there are other patterns. For example, David Miller tends to have instead a subject line like [GIT] Networking and then "Please pull" in the body of the email. So the "in:inbox git pull" thing basically catches pretty much everybody. Except for you... You use David Miller's subject line, but you don't have the "please pull" part, so your email missed my "did I pull everything that I had pending" check. For very similar reasons, for people sending me a patch, it really *really* helps if there's a "[patch]" (possibly with a n/m number pattern) in the subject line.. Or make sure you send me a git diff, because I also tend to search for the string "diff --git". Of course, the main person who sends me patches is Andrew Morton, so I mostly just search for "from:akpm" :) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts
* Tony Lindgren [131122 16:44]: > @@ -168,7 +218,13 @@ struct platform_device *of_device_alloc(struct > device_node *np, > rc = of_address_to_resource(np, i, res); > WARN_ON(rc); > } > - WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq); > + > + /* See of_device_resource_notify for populating interrupts */ > + for (i = 0; i < num_irq; i++, res++) { > + res->flags = IORESOURCE_IRQ; > + res->start = -EPROBE_DEFER; > + res->end = -EPROBE_DEFER; > + } Hmm actually we want to use some other value here as res->start is not an int. Maybe some kind of hwirq_max + EPROBE_DEFER. Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.13-rc1 is out
On 11/22/2013 01:36 PM, Linus Torvalds wrote: So you had an extra week to prepare your pull requests, and if you were planning on sending it in the last two days thinking I'd close the merge window on Sunday as usual, I can only laugh derisively in your general direction, and call you bad names. Because I'm not interested in your excuses. I did warn people about this in the 3.12 release notes. As it was, there were a few people who cut it fairly close today. You know who you are. If there are pull requests I missed (due to getting caught in spam filters, or not matching my normal search patterns), and you think you sent your pull request in time but it got overlooked, ping me - because I don't have anything pending I know about, but mistakes happen. Talking about mistakes... I suspect it was a mistake to have that extra week before the merge window opened, and I probably should just have done a 3.12-rc8 instead. Because the linux-next statistics look suspicious, and we had extra stuff show up there not just in that first week. Clearly people took that "let's have an extra week of merge window" and extrapolated it a bit too much. Oh, well. Live and learn. Anyway, other than that small oddity, this was a fairly normal merge window. By patch size we had a pretty usual ~55% drivers, 18% architecture code, 9% network updates, and the rest is spread out (fs, headers, tools, documentation). Featurewise, the big ones are likely the nftables and the multi-queue block layer stuff, but depending on your interests you might find all the incremental updates to various areas interesting. There are some odd ones in there (LE mode Powerpc support..) Go forth and test, and start sending me regression fixes. And really, if you didn't send me your pull request in time, don't whine about it. Because nobody likes a whiner. 3.13-rc1 fails to boot on Samsung Series 9. 3.12.1 is fine and the last mainline kernel that worked on it was from Nov 14th. It failed to mount /boot and after doing a manually mounting it, it came up and didn't detect USB mouse. Looking at the dmesg differences in dmesg, looks like usb probe fails on USB mouse. I will start a bisect and let you know what I find. dmesg 3.13-rc1: 63.637083] PM: Adding info for usb:2-1:1.0 [ 63.637128] driver: '2-1': driver_bound: bound to device 'usb' [ 63.637136] bus: 'usb': really_probe: bound device 2-1 to driver usb [ 123.660457] usb 2-1: USB disconnect, device number 3 [ 123.660505] bus: 'usb': remove device 2-1:1.0 [ 123.660508] PM: Removing info for usb:2-1:1.0 [ 123.661718] bus: 'usb': remove device 2-1 [ 123.661724] PM: Removing info for usb:2-1 [ 125.143833] usb 2-1: new low-speed USB device number 4 using xhci_hcd [ 125.190924] usb 2-1: New USB device found, idVendor=17ef, idProduct=6019 [ 125.190928] usb 2-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0 [ 125.190930] usb 2-1: Product: Lenovo Optical USB Mouse [ 125.191042] bus: 'usb': add device 2-1 [ 125.191054] PM: Adding info for usb:2-1 [ 125.191081] bus: 'usb': driver_probe_device: matched device 2-1 with driver usb [ 125.191084] bus: 'usb': really_probe: probing driver usb with device 2-1 [ 125.191097] usb 2-1: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes [ 125.201952] bus: 'usb': add device 2-1:1.0 [ 125.201959] PM: Adding info for usb:2-1:1.0 [ 125.202010] driver: '2-1': driver_bound: bound to device 'usb' [ 125.202019] bus: 'usb': really_probe: bound device 2-1 to driver usb dmesg 3.12.1: [5.562700] bus: 'usb': add driver usbhid [5.562790] bus: 'usb': driver_probe_device: matched device 2-1:1.0 with driv er usbhid [5.562794] bus: 'usb': really_probe: probing driver usbhid with device 2-1:1 .0 [5.575965] driver: '2-1:1.0': driver_bound: bound to device 'usbhid' [5.575971] bus: 'usb': really_probe: bound device 2-1:1.0 to driver usbhid [5.575998] usbcore: registered new interface driver usbhid [5.576000] usbhid: USB HID core driver [5.601326] input: Lenovo Optical USB Mouse as /devices/pci:00/:00:1c.4/:03:00.0/usb2/2-1/2-1:1.0/input/input6 [5.601796] hid-generic 0003:17EF:6019.0001: input,hidraw0: USB HID v1.11 Mouse [Lenovo Optical USB Mouse] on usb-:03:00.0-1/input0 [5.670722] bus: 'usb': add driver btusb [5.670736] bus: 'usb': driver_probe_device: matched device 1-1.5:1.0 with driver btusb [5.670739] bus: 'usb': really_probe: probing driver btusb with device 1-1.5:1.0 -- Shuah -- Shuah Khan Senior Linux Kernel Developer - Open Source Group Samsung Research America(Silicon Valley) shuah...@samsung.com | (970) 672-0658 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] futex: Check for pi futex_q only once
All wake_futex() callers already verify that the we are not dealing with a pi futex_q, so we can remove the redundant WARN() check, as this is never triggered anyway. Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Cc: Waiman Long Cc: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index e6ffe73..0768c68 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -844,9 +844,6 @@ static void wake_futex(struct futex_q *q) { struct task_struct *p = q->task; - if (WARN(q->pi_state || q->rt_waiter, "refusing to wake PI futex\n")) - return; - /* * We set q->lock_ptr = NULL _before_ we wake up the task. If * a non-futex wake up happens on another CPU then the task -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] futex: Misc cleanups
From: Jason Low - Remove unnecessary head variables. - Delete unused parameter in queue_unlock(). Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Cc: Waiman Long Signed-off-by: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 39 --- 1 file changed, 12 insertions(+), 27 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 80ba086..e6ffe73 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -597,13 +597,10 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, { struct futex_pi_state *pi_state = NULL; struct futex_q *this, *next; - struct plist_head *head; struct task_struct *p; pid_t pid = uval & FUTEX_TID_MASK; - head = >chain; - - plist_for_each_entry_safe(this, next, head, list) { + plist_for_each_entry_safe(this, next, >chain, list) { if (match_futex(>key, key)) { /* * Another waiter already exists - bump up @@ -985,7 +982,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) { struct futex_hash_bucket *hb; struct futex_q *this, *next; - struct plist_head *head; union futex_key key = FUTEX_KEY_INIT; int ret; @@ -998,9 +994,8 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) hb = hash_futex(); spin_lock(>lock); - head = >chain; - plist_for_each_entry_safe(this, next, head, list) { + plist_for_each_entry_safe(this, next, >chain, list) { if (match_futex (>key, )) { if (this->pi_state || this->rt_waiter) { ret = -EINVAL; @@ -1033,7 +1028,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, { union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT; struct futex_hash_bucket *hb1, *hb2; - struct plist_head *head; struct futex_q *this, *next; int ret, op_ret; @@ -1081,9 +1075,7 @@ retry_private: goto retry; } - head = >chain; - - plist_for_each_entry_safe(this, next, head, list) { + plist_for_each_entry_safe(this, next, >chain, list) { if (match_futex (>key, )) { if (this->pi_state || this->rt_waiter) { ret = -EINVAL; @@ -1096,10 +1088,8 @@ retry_private: } if (op_ret > 0) { - head = >chain; - op_ret = 0; - plist_for_each_entry_safe(this, next, head, list) { + plist_for_each_entry_safe(this, next, >chain, list) { if (match_futex (>key, )) { if (this->pi_state || this->rt_waiter) { ret = -EINVAL; @@ -1269,7 +1259,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, int drop_count = 0, task_count = 0, ret; struct futex_pi_state *pi_state = NULL; struct futex_hash_bucket *hb1, *hb2; - struct plist_head *head1; struct futex_q *this, *next; u32 curval2; @@ -1392,8 +1381,7 @@ retry_private: } } - head1 = >chain; - plist_for_each_entry_safe(this, next, head1, list) { + plist_for_each_entry_safe(this, next, >chain, list) { if (task_count - nr_wake >= nr_requeue) break; @@ -1493,7 +1481,7 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) } static inline void -queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb) +queue_unlock(struct futex_hash_bucket *hb) __releases(>lock) { spin_unlock(>lock); @@ -1866,7 +1854,7 @@ retry_private: ret = get_futex_value_locked(, uaddr); if (ret) { - queue_unlock(q, *hb); + queue_unlock(*hb); ret = get_user(uval, uaddr); if (ret) @@ -1880,7 +1868,7 @@ retry_private: } if (uval != val) { - queue_unlock(q, *hb); + queue_unlock(*hb); ret = -EWOULDBLOCK; } @@ -2028,7 +2016,7 @@ retry_private: * Task is exiting and we just wait for the * exit to complete. */ - queue_unlock(, hb); + queue_unlock(hb); put_futex_key(); cond_resched(); goto retry; @@ -2080,7 +2068,7 @@ retry_private: goto out_put_key; out_unlock_put_key: - queue_unlock(, hb); + queue_unlock(hb); out_put_key: put_futex_key(); @@ -2090,7 +2078,7 @@ out:
[PATCH 3/5] futex: Larger hash table
Currently, the futex global hash table suffers from it's fixed, smallish (for today's standards) size of 256 entries, as well as its lack of NUMA awareness. Large systems, using many futexes, can be prone to high amounts of collisions; where these futexes hash to the same bucket and lead to extra contention on the same hb->lock. Furthermore, cacheline bouncing is a reality when we have multiple hb->locks residing on the same cacheline and different futexes hash to adjacent buckets. This patch keeps the current static size of 16 entries for small systems, or otherwise, 256 * ncpus (or larger as we need to round the number to a power of 2). Note that this number of CPUs accounts for all CPUs that can ever be available in the system, taking into consideration things like hotpluging. While we do impose extra overhead at bootup by making the hash table larger, this is a one time thing, and does not shadow the benefits of this patch. Also, similar to other core kernel components (pid, dcache, tcp), by using alloc_large_system_hash() we benefit from its NUMA awareness and thus the table is distributed among the nodes instead of in a single one. We impose this function's minimum limit of 256 entries, so that in worst case scenarios or issues, we still end up using the current amount anyways. For a custom microbenchmark that pounds on the uaddr hashing -- making the wait path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server: +-+--+--+--+ | threads | baseline (ops/sec) [insns/cycle] | large hash table (ops/sec) [insns/cycle] | increase | +-+--+--+--+ | 512 | 34429[0.07] | 255274[0.48] | +641.45% | | 256 | 65452[0.07] | 443563[0.41] | +577.69% | | 128 | 125111 [0.07] | 742613[0.33] | +493.56% | | 80 | 203642 [0.09] | 1028147 [0.29] | +404.87% | | 64 | 262944 [0.09] | 997300[0.28] | +279.28% | | 32 | 642390 [0.24] | 965996[0.27] | +50.37 | +-+--+--+--+ Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Signed-off-by: Waiman Long Signed-off-by: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 0768c68..5fa9eb0 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -63,6 +63,7 @@ #include #include #include +#include #include @@ -70,7 +71,11 @@ int __read_mostly futex_cmpxchg_enabled; -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8) +#if CONFIG_BASE_SMALL +static unsigned long futex_hashsize = 16; +#else +static unsigned long futex_hashsize; +#endif /* * Futex flags used to encode options to functions and preserve them across @@ -151,7 +156,11 @@ struct futex_hash_bucket { struct plist_head chain; }; -static struct futex_hash_bucket futex_queues[1both.word)+sizeof(key->both.ptr))/4, key->both.offset); - return _queues[hash & ((1 << FUTEX_HASHBITS)-1)]; + return _queues[hash & (futex_hashsize - 1)]; } /* @@ -2715,7 +2724,14 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val, static int __init futex_init(void) { u32 curval; - int i; + unsigned long i; + +#if !CONFIG_BASE_SMALL + futex_hashsize = roundup_pow_of_two((256 * num_possible_cpus())); + futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues), + futex_hashsize, 0, 0, NULL, NULL, + 256, futex_hashsize); +#endif /* * This will fail and we want it. Some arch implementations do @@ -2730,7 +2746,7 @@ static int __init futex_init(void) if (cmpxchg_futex_value_locked(, NULL, 0, 0) == -EFAULT) futex_cmpxchg_enabled = 1; - for (i = 0; i < ARRAY_SIZE(futex_queues); i++) { + for (i = 0; i < futex_hashsize; i++) { plist_head_init(_queues[i].chain); spin_lock_init(_queues[i].lock); } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info
[PATCH 5/5] sched,futex: Provide delayed wakeup list
From: Peter Zijlstra Original patchset: https://lkml.org/lkml/2011/9/14/118 This is useful for locking primitives that can effect multiple wakeups per operation and want to avoid lock internal lock contention by delaying the wakeups until we've released the lock internal locks. Alternatively it can be used to avoid issuing multiple wakeups, and thus save a few cycles, in packet processing. Queue all target tasks and wakeup once you've processed all packets. That way you avoid waking the target task multiple times if there were multiple packets for the same task. This patch adds the needed infrastructure into the scheduler code and uses the new wake_list to delay the futex wakeups until after we've released the hash bucket locks. This avoids the newly woken tasks from immediately getting stuck on the hb->lock. Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Cc: Waiman Long Tested-by: Jason Low [forward ported] Signed-off-by: Davidlohr Bueso --- Please note that in the original thread there was some debate about spurious wakeups (https://lkml.org/lkml/2011/9/17/31), so you can consider this more of an RFC patch if folks believe that this functionality is incomplete/buggy. include/linux/sched.h | 41 + kernel/futex.c| 31 +++ kernel/sched/core.c | 19 +++ 3 files changed, 75 insertions(+), 16 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 7e35d4b..679aabb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -793,6 +793,20 @@ enum cpu_idle_type { extern int __weak arch_sd_sibiling_asym_packing(void); +struct wake_list_head { + struct wake_list_node *first; +}; + +struct wake_list_node { + struct wake_list_node *next; +}; + +#define WAKE_LIST_TAIL ((struct wake_list_node *) 0x01) + +#define WAKE_LIST(name)\ + struct wake_list_head name = { WAKE_LIST_TAIL } + + struct sched_domain_attr { int relax_domain_level; }; @@ -1078,6 +1092,8 @@ struct task_struct { unsigned int btrace_seq; #endif + struct wake_list_node wake_list; + unsigned int policy; int nr_cpus_allowed; cpumask_t cpus_allowed; @@ -2044,6 +2060,31 @@ extern void wake_up_new_task(struct task_struct *tsk); extern void sched_fork(unsigned long clone_flags, struct task_struct *p); extern void sched_dead(struct task_struct *p); +static inline void +wake_list_add(struct wake_list_head *head, struct task_struct *p) +{ + struct wake_list_node *n = >wake_list; + + /* +* Atomically grab the task, if ->wake_list is !0 already it means +* its already queued (either by us or someone else) and will get the +* wakeup due to that. +* +* This cmpxchg() implies a full barrier, which pairs with the write +* barrier implied by the wakeup in wake_up_list(). +*/ + if (cmpxchg(>next, NULL, head->first)) + return; + + /* +* The head is context local, there can be no concurrency. +*/ + get_task_struct(p); + head->first = n; +} + +extern void wake_up_list(struct wake_list_head *head, unsigned int state); + extern void proc_caches_init(void); extern void flush_signals(struct task_struct *); extern void __flush_signals(struct task_struct *); diff --git a/kernel/futex.c b/kernel/futex.c index 2f9dd5d..3d60a3d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -851,20 +851,17 @@ static void __unqueue_futex(struct futex_q *q) * The hash bucket lock must be held when this is called. * Afterwards, the futex_q must not be accessed. */ -static void wake_futex(struct futex_q *q) +static void wake_futex(struct wake_list_head *wake_list, struct futex_q *q) { struct task_struct *p = q->task; /* -* We set q->lock_ptr = NULL _before_ we wake up the task. If -* a non-futex wake up happens on another CPU then the task -* might exit and p would dereference a non-existing task -* struct. Prevent this by holding a reference on p across the -* wake up. +* Queue up and delay the futex wakeups until the hb lock +* has been released. */ - get_task_struct(p); - + wake_list_add(wake_list, p); __unqueue_futex(q); + /* * The waiting task can free the futex_q as soon as * q->lock_ptr = NULL is written, without taking any locks. A @@ -873,9 +870,6 @@ static void wake_futex(struct futex_q *q) */ smp_wmb(); q->lock_ptr = NULL; - - wake_up_state(p, TASK_NORMAL); - put_task_struct(p); } static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) @@ -992,6 +986,7 @@ futex_wake(u32 __user
[PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup
In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. This comes at the smaller cost of doing some atomic operations to keep track of the list's size. Specifically, increment the counter when an element is added to the list, and decrement when it is removed. Of course, if the counter is 0, then there are no tasks blocked on a futex. Some special considerations: - increment the counter at queue_lock() as we always end up calling queue_me() which adds the element to the list. Upon any error, queue_unlock() is called for housekeeping, for which we decrement to mach the increment done in queue_lock(). - decrement the counter at __unqueue_me() to reflect when an element is removed from the queue for wakeup related purposes. Cc: Ingo Molnar Cc: Darren Hart Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Mike Galbraith Cc: Jeff Mahoney Cc: Linus Torvalds Cc: Scott Norton Cc: Tom Vaden Cc: Aswin Chandramouleeswaran Signed-off-by: Waiman Long Signed-off-by: Jason Low Signed-off-by: Davidlohr Bueso --- kernel/futex.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/kernel/futex.c b/kernel/futex.c index 5fa9eb0..2f9dd5d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -152,6 +152,7 @@ static const struct futex_q futex_q_init = { * waiting on a futex. */ struct futex_hash_bucket { + atomic_t len; spinlock_t lock; struct plist_head chain; }; @@ -843,6 +844,7 @@ static void __unqueue_futex(struct futex_q *q) hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock); plist_del(>list, >chain); + atomic_dec(>len); } /* @@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) goto out; hb = hash_futex(); + /* make sure we really have tasks to wakeup */ + if (atomic_read(>len) == 0) + goto out_put_key; + spin_lock(>lock); plist_for_each_entry_safe(this, next, >chain, list) { @@ -1019,6 +1025,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) } spin_unlock(>lock); +out_put_key: put_futex_key(); out: return ret; @@ -1137,7 +1144,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1, */ if (likely(>chain != >chain)) { plist_del(>list, >chain); + atomic_dec(>len); plist_add(>list, >chain); + atomic_inc(>len); q->lock_ptr = >lock; } get_futex_key_refs(key2); @@ -1480,6 +1489,8 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) struct futex_hash_bucket *hb; hb = hash_futex(>key); + atomic_inc(>len); + q->lock_ptr = >lock; spin_lock(>lock); @@ -1491,6 +1502,7 @@ queue_unlock(struct futex_hash_bucket *hb) __releases(>lock) { spin_unlock(>lock); + atomic_dec(>len); } /** @@ -,6 +2234,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, * Unqueue the futex_q and determine which it was. */ plist_del(>list, >chain); + atomic_dec(>len); /* Handle spurious wakeups gracefully */ ret = -EWOULDBLOCK; @@ -2749,6 +2762,7 @@ static int __init futex_init(void) for (i = 0; i < futex_hashsize; i++) { plist_head_init(_queues[i].chain); spin_lock_init(_queues[i].lock); + atomic_set(_queues[i].len, 0); } return 0; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] futex: Wakeup optimizations
We have been dealing with a customer database workload on large 12Tb, 240 core 16 socket NUMA system that exhibits high amounts of contention on some of the locks that serialize internal futex data structures. This workload specially suffers in the wakeup paths, where waiting on the corresponding hb->lock can account for up to ~60% of the time. The result of such calls can mostly be classified as (i) nothing to wake up and (ii) wakeup large amount of tasks. Before these patches are applied, we can see this pathological behavior: 37.12% 826174 xxx [kernel.kallsyms] [k] _raw_spin_lock --- _raw_spin_lock | |--97.14%-- futex_wake | do_futex | sys_futex | system_call_fastpath | | | |--99.70%-- 0x7f383fbdea1f | | yyy 43.71% 762296 xxx [kernel.kallsyms] [k] _raw_spin_lock --- _raw_spin_lock | |--53.74%-- futex_wake | do_futex | sys_futex | system_call_fastpath | | | |--99.40%-- 0x7fe7d44a4c05 | | zzz |--45.90%-- futex_wait_setup | futex_wait | do_futex | sys_futex | system_call_fastpath | 0x7fe7ba315789 | syscall With these patches, contention is practically non existent: 0.10% 49 xxx [kernel.kallsyms] [k] _raw_spin_lock --- _raw_spin_lock | |--76.06%-- futex_wait_setup | futex_wait | do_futex | sys_futex | system_call_fastpath | | | |--99.90%-- 0x7f3165e63789 | | syscall| ... |--6.27%-- futex_wake | do_futex | sys_futex | system_call_fastpath | | | |--54.56%-- 0x7f317fff2c05 ... Patches 1 & 2 are cleanups and micro optimizations. Patch 3 addresses the well known issue of the global hash table. By creating a larger and NUMA aware table, we can reduce the false sharing and collisions, thus reducing the chance of different futexes using hb->lock. Patch 4 reduces contention on the corresponding hb->lock by not trying to acquire it if there are no blocked tasks in the waitqueue. This particularly deals with point (i) above, where we see that it is not uncommon for up to 90% of wakeup calls end up returning 0, indicating that no tasks were woken. Patch 5 resurrects a two year old idea from Peter Zijlstra to delay the waking of the blocked tasks to be done without holding the hb->lock: https://lkml.org/lkml/2011/9/14/118 This is useful for locking primitives that can effect multiple wakeups per operation and want to avoid the futex's internal spinlock contention by delaying the wakeups until we've released the hb->lock. This particularly deals with point (ii) above, where we can observe that in occasions the wake calls end up waking 125 to 200 waiters in what we believe are RW locks in the application. This patchset has also been tested on smaller systems for a variety of benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of hb locks programs. So far, no functional or performance regressions have been seen. Furthermore, no issues were found when running the different tests in the futextest suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/ This patchset applies on top of Linus' tree as of v3.13-rc1. Special thanks to Scott Norton, Tom Vanden and Mark Ray for help presenting, debugging and analyzing the data. futex: Misc cleanups futex: Check for pi futex_q only once futex: Larger hash table futex: Avoid taking hb lock if nothing to wakeup sched,futex: Provide delayed wakeup list include/linux/sched.h | 41 ++ kernel/futex.c| 113 +++--- kernel/sched/core.c | 19 + 3 files changed, 122 insertions(+), 51 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts
On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote: > + /* See of_device_resource_notify for populating interrupts */ > + for (i = 0; i < num_irq; i++, res++) { > + res->flags = IORESOURCE_IRQ; > + res->start = -EPROBE_DEFER; > + res->end = -EPROBE_DEFER; NAK. Definitely a bad idea to start introducing magic values other into resources. Please don't do this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] devicetree: doc: Document ti,timer-parent property
* Joel Fernandes [131122 16:32]: > On 11/22/2013 11:08 AM, Tony Lindgren wrote: > > > > I don't think there's a dependency here to the omap clocks as the dmtimer > > can implement the clocksource separately and internally still use clk_get > > using the clock alias table. > > You mean implement clock-tree separately? Sorry I'm confused can you clarify > what you mean? You could implement the needed clocks for client drivers to use in dmtimer.c directly if dmtimer.c is the gating those clocks. > In clock tree data (not upstream), here is the system clock for am335x for > example: > sys_clkin_ck: sys_clkin_ck@44e10040 { > #clock-cells = <0>; > compatible = "mux-clock"; > > It uses the mux-clock driver. Are you saying we duplicate clock-tree stuff? I > don't think that's a good idea specially since once clock dt data is > available, > we will switch to using it. Oh OK, then that naturally could be used directly too. > Optional properties: > - ti,timer-alwon: Indicates the timer is in an alway-on power > domain. > >>> > >>> Hmm this we may not need, this can probably be deciphered from the > >>> compatible > >>> flag already? > >> > >> How? Compatible contains the same string, for example for OMAP4: > >> > >> timer8 has: > >> compatible = "ti,omap4430-timer"; > >> ti,timer-pwm; > >> ti,timer-dsp; > >> > >> and timer9 has: > >> compatible = "ti,omap4430-timer"; > >> ti,hwmods = "timer9"; > >> ti,timer-pwm; > >> > > > > Some of these features are always hardwired certain way and could be mapped > > to > > the right timer based on the timer offset and the compatible flag if we want > > to avoid adding the ti,timer-alwon property. It seems that most omaps have > > just one always on timer that's the first timer, and only on am33xx it does > > not exist? > > > > I'm fine adding ti,timer-alwon if it help to leave out static data in the > > driver and avoid patching the driver for every new SoC. But sounds like in > > this case we really have just the am33xx exception to the rule? > > ti,timer-alwon may not be needed yes, since on all platforms I've observed > first > timer has this property. However from OMAP3 dt, timer12 has it too. Not sure > what that implies. I guess you could mark timer1 and 12 as always on if the compatible flag matches ti,omap3430-timer. > I think what Jon was trying to do is to find a DT node by property, he had no > other way of getting a device_node * otherwise. > > But notice this macro used for HS (secure devices): > OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure", > 2, "timer_sys_ck", NULL); > > How would you specify in DT that you want a node with the timer-secure > property > as the clocksource if we drop these kind of properties? timer { interrupt-parent = <>; ... } Should do the trick I think :) > >>> Then for the users of a specific dmtimer, they can select the right one > >>> using > >>> the interrupt-parent property: > >>> > >>> timer1: timer@0x4800abcd { > >>> compatible = "ti,omap5430-timer"; > >>> #interrupt-cells = <1>; /* needs irqchip implemented for > >>> dmtimer */ > >>> interrupt-controller; > >>> #clock-cells = <1>; /* needs clocksource implemented for > >>> dmtimer */ > >>> clock-output-names = "32k", "sys_ck"; > >>> ... > >>> }; > >> > >> In reference to my last thread reply, irqchip may not be available early > >> in the > >> boot process (.init_time) for system timer usage? > > > > Hmm it should be, looks like we have in arch/arm/kernel/irq.c: > > > > void __init init_IRQ(void) > > { > > if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq) > > irqchip_init(); > > else > > machine_desc->init_irq(); > > } > > > > Then in init/main.c has: > > ... > > early_irq_init(); > > init_IRQ(); > > tick_init(); > > init_timers(); > > hrtimers_init(); > > softirq_init(); > > timekeeping_init(); > > time_init(); > > ... > > > > So looks like we should have irqchip available? > > Right. I think your idea of using irqchip is certainly a clean way. Let me go > back to the drawing board and try to see if we have all the pieces we need and > there are no surprises in doing it this way. OK cool. > Then we can have a general purpose clocksource driver that uses the irqchip > driver. I still worry about things like hwmod that may be need to be called on > specific timer, and runtime PM is not available that early in the boot > process. Yeah we may still need a piece of code in mach-omap2 for now if runtime PM is not available at that point yet. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
mmotm 2013-11-22-16-50 uploaded
The mm-of-the-moment snapshot 2013-11-22-16-50 has been uploaded to http://www.ozlabs.org/~akpm/mmotm/ mmotm-readme.txt says README for mm-of-the-moment: http://www.ozlabs.org/~akpm/mmotm/ This is a snapshot of my -mm patch queue. Uploaded at random hopefully more than once a week. You will need quilt to apply these patches to the latest Linus release (3.x or 3.x-rcY). The series file is in broken-out.tar.gz and is duplicated in http://ozlabs.org/~akpm/mmotm/series The file broken-out.tar.gz contains two datestamp files: .DATE and .DATE--mm-dd-hh-mm-ss. Both contain the string -mm-dd-hh-mm-ss, followed by the base kernel version against which this patch series is to be applied. This tree is partially included in linux-next. To see which patches are included in linux-next, consult the `series' file. Only the patches within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in linux-next. A git tree which contains the memory management portion of this tree is maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git by Michal Hocko. It contains the patches which are between the "#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series file, http://www.ozlabs.org/~akpm/mmotm/series. A full copy of the full kernel tree with the linux-next and mmotm patches already applied is available through git within an hour of the mmotm release. Individual mmotm releases are tagged. The master branch always points to the latest release, so it's constantly rebasing. http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary To develop on top of mmotm git: $ git remote add mmotm git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git $ git remote update mmotm $ git checkout -b topic mmotm/master $ git send-email mmotm/master.. [...] To rebase a branch with older patches to a new mmotm release: $ git remote update mmotm $ git rebase --onto mmotm/master topic The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second) contains daily snapshots of the -mm tree. It is updated more frequently than mmotm, and is untested. A git copy of this tree is available at http://git.cmpxchg.org/?p=linux-mmots.git;a=summary and use of this tree is similar to http://git.cmpxchg.org/?p=linux-mmotm.git, described above. This mmotm tree contains the following patches against 3.13-rc1: (patches marked "*" will be included in linux-next) arch-alpha-kernel-systblss-remove-debug-check.patch i-need-old-gcc.patch * include-linux-hugetlbh-make-isolate_huge_page-an-inline.patch * documentation-docbook-device-driverstmpl-fix-generation-of-device-drivers.patch * mm-memcg-do-not-declare-oom-from-__gfp_nofail-allocations.patch * drivers-rtc-rtc-at91rm9200c-correct-alarm-over-day-month-wrap.patch * drivers-rtc-rtc-s5mc-fix-info-rtc-assignment.patch * arch-x86-mnify-pte_to_pgoff-and-pgoff_to_pte-helpers.patch * x86-mm-get-aslr-work-for-hugetlb-mappings.patch * dma-debug-enhance-dma_debug_device_change-to-check-for-mapping-errors.patch * drm-fb-helper-dont-sleep-for-screen-unblank-when-an-oopps-is-in-progress.patch * drm-cirrus-correct-register-values-for-16bpp.patch * drm-nouveau-make-vga_switcheroo-code-depend-on-vga_switcheroo.patch * drivers-gpu-drm-drm_edid_loadc-make-edid_load-return-a-void.patch * genirq-correct-fuzzy-and-fragile-irq_retval-definition.patch * sched_clock-document-4mhz-vs-1mhz-decision.patch * kernel-timerc-convert-kmalloc_nodegfp_zero-to-kzalloc_node.patch * kernel-time-tick-commonc-document-tick_do_timer_cpu.patch * input-route-kbd-leds-through-the-generic-leds-layer.patch * drivers-net-irda-donauboe-convert-to-module_pci_driver.patch * ocfs2-free-allocated-clusters-if-error-occurs-after-ocfs2_claim_clusters.patch * ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one.patch * ocfs2-should-call-ocfs2_journal_access_di-before-ocfs2_delete_entry-in-ocfs2_orphan_del.patch * ocfs2-llseek-requires-ocfs2-inode-lock-for-the-file-in-seek_end.patch * ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size.patch * ocfs2-update-inode-size-after-zeronig-the-hole.patch * scsi-do-not-call-do_div-with-a-64-bit-divisor.patch * drivers-scsi-megaraid-megaraid_mmc-missing-bounds-check-in-mimd_to_kioc.patch * drivers-cdrom-gdromc-remove-deprecated-irqf_disabled.patch * drivers-block-sx8c-use-module_pci_driver.patch * drivers-block-sx8c-remove-unnecessary-pci_set_drvdata.patch * drivers-block-paride-pgc-underflow-bug-in-pg_write.patch * hpsa-return-0-from-driver-probe-function-on-success-not-1.patch * drivers-block-ccissc-cciss_init_one-use-proper-errnos.patch * blk-mq-use-__smp_call_function_single-directly.patch * block-blk-mq-cpuc-use-hotcpu_notifier.patch * block-remove-unrelated-header-files-and-export-symbol.patch * mtd-cmdlinepart-use-cmdline-partition-parser-lib.patch * mtd-cmdlinepart-use-cmdline-partition-parser-lib-fix.patch *
Re: Linux 3.13-rc1 is out
Hi Linus, I'd sent a pull yesterday (http://lkml.org/lkml/2013/11/21/599 ) which may have missed your cutoff, but it seems worth checking. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] of/platform: Fix no irq domain found errors when populating interrupts
Currently we get the following kind of errors if we try to use interrupt phandles to irqchips that have not yet initialized: irq: no irq domain found for /ocp/pinmux@48002030 ! WARNING: CPU: 0 PID: 1 at drivers/of/platform.c:171 of_device_alloc+0x144/0x184() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00038-g42a9708 #1012 (show_stack+0x14/0x1c) (dump_stack+0x6c/0xa0) (warn_slowpath_common+0x64/0x84) (warn_slowpath_null+0x1c/0x24) (of_device_alloc+0x144/0x184) (of_platform_device_create_pdata+0x44/0x9c) (of_platform_bus_create+0xd0/0x170) (of_platform_bus_create+0x12c/0x170) (of_platform_populate+0x60/0x98) ... This is because we're wrongly trying to populate resources that are not yet available. It's perfectly valid to create irqchips dynamically, so let's fix up the issue by populating the interrupt resources based on a notifier call instead. Signed-off-by: Tony Lindgren --- Rob & Grant, care to merge this for the -rc if this looks OK to you? These happen for example when using interrupts-extended for omap wake-up interrupts where the irq domain is created by pinctrl-single.c at module_init time. --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -130,6 +130,56 @@ void of_device_make_bus_id(struct device *dev) dev_set_name(dev, "%s.%d", node->name, magic - 1); } +/* + * The device interrupts are not necessarily available for all + * irqdomains initially so we need to populate them using a + * notifier. + */ +static int of_device_resource_notify(struct notifier_block *nb, +unsigned long event, void *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct device_node *np = pdev->dev.of_node; + struct resource *res = pdev->resource; + struct resource *irqr = NULL; + int num_irq, i, found = 0; + + if (event != BUS_NOTIFY_BIND_DRIVER) + return 0; + + if (!np) + goto out; + + num_irq = of_irq_count(np); + if (!num_irq) + goto out; + + for (i = 0; i < pdev->num_resources; i++, res++) { + if (res->flags != IORESOURCE_IRQ || + res->start != -EPROBE_DEFER || + res->end != -EPROBE_DEFER) + continue; + + if (!irqr) + irqr = res; + found++; + } + + if (!found) + goto out; + + if (found != num_irq) { + dev_WARN(dev, "error populating irq resources: %i != %i\n", +found, num_irq); + goto out; + } + + WARN_ON(of_irq_to_resource_table(np, irqr, num_irq) != num_irq); + +out: + return NOTIFY_DONE; +} + /** * of_device_alloc - Allocate and initialize an of_device * @np: device node to assign to device @@ -168,7 +218,13 @@ struct platform_device *of_device_alloc(struct device_node *np, rc = of_address_to_resource(np, i, res); WARN_ON(rc); } - WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq); + + /* See of_device_resource_notify for populating interrupts */ + for (i = 0; i < num_irq; i++, res++) { + res->flags = IORESOURCE_IRQ; + res->start = -EPROBE_DEFER; + res->end = -EPROBE_DEFER; + } } dev->dev.of_node = of_node_get(np); @@ -447,6 +503,8 @@ int of_platform_bus_probe(struct device_node *root, } EXPORT_SYMBOL(of_platform_bus_probe); +static struct notifier_block resource_nb; + /** * of_platform_populate() - Populate platform_devices from device tree data * @root: parent of the first level to probe or NULL for the root of the tree @@ -478,6 +536,11 @@ int of_platform_populate(struct device_node *root, if (!root) return -EINVAL; + if (!resource_nb.notifier_call) { + resource_nb.notifier_call = of_device_resource_notify, + bus_register_notifier(_bus_type, _nb); + } + for_each_child_of_node(root, child) { rc = of_platform_bus_create(child, matches, lookup, parent, true); if (rc) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] devicetree: doc: Document ti,timer-parent property
On 11/22/2013 11:08 AM, Tony Lindgren wrote: > * Joel Fernandes [131122 08:37]: >> Hi Tony, >> >> Thanks for your comments, few replies inline below.. >> >> On 11/22/2013 09:58 AM, Tony Lindgren wrote: >>> * Joel Fernandes [131121 18:00]: +Required properties for system timers (clockevents/clocksource): +- ti,timer-parent:System timer's parent mux clock needs to be setup. + This is currently hardcoded in code, for DT boot we + move this to DT. + >>> >>> This can be replaced with just clk_set_rate, or clk_set_parent if needed. >>> Or by having a clocks = <&32k_clk> property in the dmtimer node in the >>> .dts file. >>> >> >> Sure, but clock-data is still not available to make this possible in >> mainline. >> We also discussed earlier right that we don't want dependencies as much as >> possible to get one chunk in and working at a time. I was thinking like for a >> first-pass since there's a lot of unrelated code that doesn't have >> dependencies, >> but needs this to work, we can introduce this property for now and drop it >> later >> as a "cost of migration"? > > I don't think there's a dependency here to the omap clocks as the dmtimer > can implement the clocksource separately and internally still use clk_get > using the clock alias table. You mean implement clock-tree separately? Sorry I'm confused can you clarify what you mean? In clock tree data (not upstream), here is the system clock for am335x for example: sys_clkin_ck: sys_clkin_ck@44e10040 { #clock-cells = <0>; compatible = "mux-clock"; It uses the mux-clock driver. Are you saying we duplicate clock-tree stuff? I don't think that's a good idea specially since once clock dt data is available, we will switch to using it. Optional properties: - ti,timer-alwon: Indicates the timer is in an alway-on power domain. >>> >>> Hmm this we may not need, this can probably be deciphered from the >>> compatible >>> flag already? >> >> How? Compatible contains the same string, for example for OMAP4: >> >> timer8 has: >> compatible = "ti,omap4430-timer"; >> ti,timer-pwm; >> ti,timer-dsp; >> >> and timer9 has: >> compatible = "ti,omap4430-timer"; >> ti,hwmods = "timer9"; >> ti,timer-pwm; >> > > Some of these features are always hardwired certain way and could be mapped to > the right timer based on the timer offset and the compatible flag if we want > to avoid adding the ti,timer-alwon property. It seems that most omaps have > just one always on timer that's the first timer, and only on am33xx it does > not exist? > > I'm fine adding ti,timer-alwon if it help to leave out static data in the > driver and avoid patching the driver for every new SoC. But sounds like in > this case we really have just the am33xx exception to the rule? ti,timer-alwon may not be needed yes, since on all platforms I've observed first timer has this property. However from OMAP3 dt, timer12 has it too. Not sure what that implies. I think what Jon was trying to do is to find a DT node by property, he had no other way of getting a device_node * otherwise. But notice this macro used for HS (secure devices): OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure", 2, "timer_sys_ck", NULL); How would you specify in DT that you want a node with the timer-secure property as the clocksource if we drop these kind of properties? >>> Then for the users of a specific dmtimer, they can select the right one >>> using >>> the interrupt-parent property: >>> >>> timer1: timer@0x4800abcd { >>> compatible = "ti,omap5430-timer"; >>> #interrupt-cells = <1>; /* needs irqchip implemented for >>> dmtimer */ >>> interrupt-controller; >>> #clock-cells = <1>; /* needs clocksource implemented for >>> dmtimer */ >>> clock-output-names = "32k", "sys_ck"; >>> ... >>> }; >> >> In reference to my last thread reply, irqchip may not be available early in >> the >> boot process (.init_time) for system timer usage? > > Hmm it should be, looks like we have in arch/arm/kernel/irq.c: > > void __init init_IRQ(void) > { > if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq) > irqchip_init(); > else > machine_desc->init_irq(); > } > > Then in init/main.c has: > ... > early_irq_init(); > init_IRQ(); > tick_init(); > init_timers(); > hrtimers_init(); > softirq_init(); > timekeeping_init(); > time_init(); > ... > > So looks like we should have irqchip available? Right. I think your idea of using irqchip is certainly a clean way. Let me go back to the drawing board and try to see if we have all the pieces we need and there are no surprises in doing it this way.
Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on
On Fri, Nov 22, 2013 at 11:05:24PM +, Mel Gorman wrote: > On Fri, Nov 22, 2013 at 03:28:07PM -0600, Alex Thorlton wrote: > > > If the warning added by that patch does *not* trigger than can you also > > > test this patch? It removes the barriers which should not be necessary > > > and takes a reference tot he page before waiting on the lock. The > > > previous version did not take the reference because otherwise the > > > WARN_ON could not distinguish between a migration waiter and a surprise > > > gup. > > > > Sorry for the delay; been a bit busy. I tested both of these patches on > > top of this one (separately, of course): > > > > http://www.spinics.net/lists/linux-mm/msg63919.html > > > > I think that's the one you were referring to, if not send me a pointer > > to the correct one and I'll give it another shot. Both patches still > > segfaulted, so it doesn't appear that either of these solved the > > problem. > > I see. Does THP have to be enabled or does it segfault even with THP > disabled? > On a semi-related note, is the large machine doing anything with xpmem or anything that depends on MMU notifiers to work properly? I noted while looking at this that THP migration is not invalidating pages which might be confusing a driver depending on it. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: recursive locking (coredump/vfs_write)
On Fri, Nov 22, 2013 at 10:11:56PM +0100, Jan Kara wrote: > Hi, > > On Wed 13-11-13 16:11:47, Dave Jones wrote: > > here's another one.. > > > > > > = > > [ INFO: possible recursive locking detected ] > > 3.12.0+ #2 Not tainted > > - > > trinity-child3/13302 is trying to acquire lock: > > (sb_writers#5){.+.+.+}, at: [] vfs_write+0x173/0x1f0 > > > > but task is already holding lock: > > (sb_writers#5){.+.+.+}, at: [] do_coredump+0xf1d/0x1070 > Thanks for report. Attached patch should fix this. Al, can you please > merge it? No. It's already fixed in mainline by commit 52da40. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] net: MOXA ART: connect to PHY and add ethtool support
On Fri, 2013-11-22 at 15:57 +0100, Jonas Jensen wrote: > The kernel now has a MDIO bus driver and a phy_driver (RTL8201CP), > connect to this PHY using OF and add ethtool support. > > Signed-off-by: Jonas Jensen > --- > > Notes: > Applies to next-20131122 > > drivers/net/ethernet/moxa/moxart_ether.c | 179 > ++- > drivers/net/ethernet/moxa/moxart_ether.h | 1 + > 2 files changed, 179 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/moxa/moxart_ether.c > b/drivers/net/ethernet/moxa/moxart_ether.c > index 3c14afd..bcc6005 100644 > --- a/drivers/net/ethernet/moxa/moxart_ether.c > +++ b/drivers/net/ethernet/moxa/moxart_ether.c > @@ -26,9 +26,15 @@ > #include > #include > #include > +#include > +#include > > #include "moxart_ether.h" > > +#define DRV_NAME"moxart-ethernet" > +#define DRV_VERSION "0.2" > +#define MOXART_NUM_STATS16 MOXART_NUM_STATS should be defined as ARRAY_SIZE(ethtool_stats_keys). [...] > +static void moxart_get_drvinfo(struct net_device *ndev, > +struct ethtool_drvinfo *info) > +{ > + strlcpy(info->driver, DRV_NAME, sizeof(info->driver)); > + strlcpy(info->version, DRV_VERSION, sizeof(info->version)); > + strlcpy(info->bus_info, dev_name(>dev), sizeof(info->bus_info)); > + info->n_stats = MOXART_NUM_STATS; [...] Don't initialise n_stats here; the core will initialise it using your get_sset_count implementation. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: add strictlimit knob -v2
On Wed, 06 Nov 2013 19:05:57 +0400 Maxim Patlasov wrote: > "strictlimit" feature was introduced to enforce per-bdi dirty limits for > FUSE which sets bdi max_ratio to 1% by default: > > http://article.gmane.org/gmane.linux.kernel.mm/105809 > > However the feature can be useful for other relatively slow or untrusted > BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable the > feature: > > echo 1 > /sys/class/bdi/X:Y/strictlimit > > Being enabled, the feature enforces bdi max_ratio limit even if global (10%) > dirty limit is not reached. Of course, the effect is not visible until > /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value. > > ... > > --- a/Documentation/ABI/testing/sysfs-class-bdi > +++ b/Documentation/ABI/testing/sysfs-class-bdi > @@ -53,3 +53,11 @@ stable_pages_required (read-only) > > If set, the backing device requires that all pages comprising a write > request must not be changed until writeout is complete. > + > +strictlimit (read-write) > + > + Forces per-BDI checks for the share of given device in the write-back > + cache even before the global background dirty limit is reached. This > + is useful in situations where the global limit is much higher than > + affordable for given relatively slow (or untrusted) device. Turning > + strictlimit on has no visible effect if max_ratio is equal to 100%. > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index ce682f7..4ee1d64 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -234,11 +234,46 @@ static ssize_t stable_pages_required_show(struct device > *dev, > } > static DEVICE_ATTR_RO(stable_pages_required); > > +static ssize_t strictlimit_store(struct device *dev, > + struct device_attribute *attr, const char *buf, size_t count) > +{ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); > + unsigned int val; > + ssize_t ret; > + > + ret = kstrtouint(buf, 10, ); > + if (ret < 0) > + return ret; > + > + switch (val) { > + case 0: > + bdi->capabilities &= ~BDI_CAP_STRICTLIMIT; > + break; > + case 1: > + bdi->capabilities |= BDI_CAP_STRICTLIMIT; > + break; > + default: > + return -EINVAL; > + } > + > + return count; > +} > +static ssize_t strictlimit_show(struct device *dev, > + struct device_attribute *attr, char *page) > +{ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); > + > + return snprintf(page, PAGE_SIZE-1, "%d\n", > + !!(bdi->capabilities & BDI_CAP_STRICTLIMIT)); > +} > +static DEVICE_ATTR_RW(strictlimit); > + > static struct attribute *bdi_dev_attrs[] = { > _attr_read_ahead_kb.attr, > _attr_min_ratio.attr, > _attr_max_ratio.attr, > _attr_stable_pages_required.attr, > + _attr_strictlimit.attr, > NULL, Well the patch is certainly simple and straightforward enough and *seems* like it will be useful. The main (and large!) downside is that it adds to the user interface so we'll have to maintain this feature and its functionality for ever. Given this, my concern is that while potentially useful, the feature might not be *sufficiently* useful to justify its inclusion. So we'll end up addressing these issues by other means, then we're left maintaining this obsolete legacy feature. So I'm thinking that unless someone can show that this is good and complete and sufficient for a "large enough" set of issues, I'll take a pass on the patch[1]. What do people think? [1] Actually, I'll stick it in -mm and maintain it, so next time someone reports an issue I can say "hey, try this". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mmzone.h: constify some zone access functions
On Fri, 22 Nov 2013 14:30:47 -0800 Andrew Morton wrote: > On Fri, 22 Nov 2013 12:01:06 -0500 Luiz Capitulino > wrote: > > > Signed-off-by: Luiz capitulino > > --- > > include/linux/mmzone.h | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index bd791e4..5e202d6 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -560,12 +560,12 @@ static inline bool zone_spans_pfn(const struct zone > > *zone, unsigned long pfn) > > return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone); > > } > > > > -static inline bool zone_is_initialized(struct zone *zone) > > +static inline bool zone_is_initialized(const struct zone *zone) > > { > > return !!zone->wait_table; > > } > > > > -static inline bool zone_is_empty(struct zone *zone) > > +static inline bool zone_is_empty(const struct zone *zone) > > { > > return zone->spanned_pages == 0; > > } > > @@ -843,7 +843,7 @@ unsigned long __init node_memmap_size_bytes(int, > > unsigned long, unsigned long); > > */ > > #define zone_idx(zone) ((zone) - > > (zone)->zone_pgdat->node_zones) > > > > -static inline int populated_zone(struct zone *zone) > > +static inline int populated_zone(const struct zone *zone) > > { > > return (!!zone->present_pages); > > } > > hm, why? I counted ten similarly constifyable functions in mm.h and > stopped only 1/4 of the way through. What's so special about these three? I spotted them while reading code. If you want to me convert the others, I can do that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] ACPI: Add support to force header inclusion rules for .
From: Lv Zheng As there is only CONFIG_ACPI=n processing in the , it is not safe to include , and directly for source out of Linux ACPI subsystems. This patch adds error messaging to warn developers of such wrong inclusions. In order not to be bisected and reverted as a wrong commit, warning messages are carefully split into a seperate patch other than the wrong inclusion cleanups. Signed-off-by: Lv Zheng --- drivers/acpi/acpica/Makefile|2 +- include/acpi/acpi_bus.h |5 + include/acpi/acpi_drivers.h |5 + include/acpi/platform/aclinux.h | 10 ++ 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpica/Makefile b/drivers/acpi/acpica/Makefile index 4383040..7738677 100644 --- a/drivers/acpi/acpica/Makefile +++ b/drivers/acpi/acpica/Makefile @@ -2,7 +2,7 @@ # Makefile for ACPICA Core interpreter # -ccflags-y := -Os +ccflags-y := -Os -DLINUXIZED_ACPICA ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT # use acpi.o to put all files here into acpi.o modparam namespace diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 0af9667..0b1ea80 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -26,6 +26,11 @@ #ifndef __ACPI_BUS_H__ #define __ACPI_BUS_H__ +/* is not safe for CONFIG_ACPI=n environment */ +#ifndef _LINUX_ACPI_H +#error "Please don't include direclty, including instead." +#endif + #include diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h index f3f1219..f5bed3a 100644 --- a/include/acpi/acpi_drivers.h +++ b/include/acpi/acpi_drivers.h @@ -26,6 +26,11 @@ #ifndef __ACPI_DRIVERS_H__ #define __ACPI_DRIVERS_H__ +/* is not safe for CONFIG_ACPI=n environment */ +#ifndef _LINUX_ACPI_H +#error "Please don't include directly, including instead." +#endif + #define ACPI_MAX_STRING80 diff --git a/include/acpi/platform/aclinux.h b/include/acpi/platform/aclinux.h index 28f4f4d..0e05771 100644 --- a/include/acpi/platform/aclinux.h +++ b/include/acpi/platform/aclinux.h @@ -44,6 +44,16 @@ #ifndef __ACLINUX_H__ #define __ACLINUX_H__ +#ifdef __KERNEL__ + +/* ACPICA external files should not include ACPICA headers directly. */ + +#if !defined(LINUXIZED_ACPICA) && !defined(_LINUX_ACPI_H) +#error "Please don't include directly, including instead." +#endif + +#endif + /* Common (in-kernel/user-space) ACPICA configuration */ #define ACPI_USE_SYSTEM_CLIBRARY -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] ACPICA: Cleanup inclusions.
From: Lv Zheng This patch cleans up inclusions. 1. For files where is already included, this patch cleans up them by deleting inclusions. Note that if is included before including in the original file, the inclusion of is moved up to avoid issues from happening. 2. The sfi_acpi files have invoked acpi_table_parse(), so this patch converts inclusions into inclusions so that they can find the acpi_parse_table() prototypes. 3. For files where only included, this patch cleans up them by replacing inclusions with . Such files are those include ACPICA headers to find table definitions or ACPICA basic types. 4. For files where no actual ACPICA references, this patch cleans up them by deleting inclusions. Signed-off-by: Lv Zheng --- arch/x86/pci/mmconfig_32.c |1 - drivers/acpi/apei/einj.c |1 - drivers/acpi/osl.c |1 - drivers/char/tpm/tpm_acpi.c|2 +- drivers/gpu/drm/nouveau/nouveau_acpi.c |1 - drivers/hwmon/asus_atk0110.c |1 - drivers/ide/ide-acpi.c |1 - drivers/iommu/amd_iommu_init.c |1 - drivers/iommu/intel_irq_remapping.c|2 +- drivers/pci/hotplug/pciehp.h |1 - drivers/pci/pci-acpi.c |1 - drivers/platform/x86/tc1100-wmi.c |1 - drivers/sfi/sfi_acpi.c |2 +- include/linux/acpi_io.h|2 +- include/linux/ide.h|3 --- include/linux/iscsi_ibft.h |2 +- include/linux/pci_hotplug.h|1 - include/linux/sfi_acpi.h |2 +- include/linux/tboot.h |2 +- 19 files changed, 7 insertions(+), 21 deletions(-) diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 5c90975..43984bc 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -14,7 +14,6 @@ #include #include #include -#include /* Assume systems with more busses have correct MCFG */ #define mmcfg_virt_addr ((void __iomem *) fix_to_virt(FIX_PCIE_MCFG)) diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index fb57d03..ca0c6d7 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -33,7 +33,6 @@ #include #include #include -#include #include "apei-internal.h" diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 8507034..c4a0e45 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -49,7 +49,6 @@ #include #include -#include #include #include "internal.h" diff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c index 64420b3..b9a57fa 100644 --- a/drivers/char/tpm/tpm_acpi.c +++ b/drivers/char/tpm/tpm_acpi.c @@ -23,7 +23,7 @@ #include #include #include -#include +#include #include "tpm.h" #include "tpm_eventlog.h" diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c index 6763865..5910ef5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c @@ -2,7 +2,6 @@ #include #include #include -#include #include #include diff --git a/drivers/hwmon/asus_atk0110.c b/drivers/hwmon/asus_atk0110.c index 9b64d26..eea1f60 100644 --- a/drivers/hwmon/asus_atk0110.c +++ b/drivers/hwmon/asus_atk0110.c @@ -17,7 +17,6 @@ #include #include -#include #include diff --git a/drivers/ide/ide-acpi.c b/drivers/ide/ide-acpi.c index d9e1f7c..333d405 100644 --- a/drivers/ide/ide-acpi.c +++ b/drivers/ide/ide-acpi.c @@ -14,7 +14,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 8f798be..28b4bea 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index bab10b1..01c1dd9 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -10,7 +10,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index f773254..fc51cab 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -163,7 +163,6 @@ static inline const char *slot_name(struct slot *slot) } #ifdef CONFIG_ACPI -#include #include void __init pciehp_acpi_slot_detection_init(void); diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index d0e4999..4c77380 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include diff --git a/drivers/platform/x86/tc1100-wmi.c b/drivers/platform/x86/tc1100-wmi.c index d16193e..6a6ea28 100644 --- a/drivers/platform/x86/tc1100-wmi.c
[PATCH 2/4] ACPI: Cleanup and inclusions.
From: Lv Zheng This patch enfoces inclusion instead of direct and inclusions. It's required that included prior than to find declarations for acpi_pci_root. It is required that included prior than to find declarations for acpi_handle and acpi_notify_handler. In order to work around this, there are and inclusions in the . This messes up the inclusion order in the following style: +-- | | +-- | | | | +-- if CONFIG_ACPI=y | | | | | +->Included if CONFIG_ACPI=y | | | | +-- if CONFIG_ACPI=y | | | | | | +-- if CONFIG_ACPI=y and CONFIG_ACPI=n | | | | | | | +->Excluded | | | | | +->Included if CONFIG_ACPI=y | | | | +-- if CONFIG_ACPI=y | | | | | +->Excluded | | | +->Included if CONFIG_ACPI=y and CONFIG_ACPI=n | | +-- | | | | +-- if CONFIG_ACPI=y and CONFIG_ACPI=n | | | | | +->Inlcuded if CONFIG_ACPI=n | | | +->Included if CONFIG_ACPI=n | +->Included if CONFIG_ACPI=y and CONFIG_ACPI=n Figure 1. Current ACPI Headers Inclusion Order When CONFIG_ACPI is disabled, the result inclusion order of "#include turns out to be: Figure 2. ACPI Headers Inclusion Order CONFIG_ACPI=n When CONFIG_ACPI is enabled, the result inclusion order of "#include turns out to be: Figure 3. ACPI Headers Inclusion Order CONFIG_ACPI=n In order to follow the type dependencies, the best order should be: or Figure 4. Best ACPI Headers Inclusion Order By investigation, currently are all included for CONFIG_ACPI=y, thus this is not a real issue but would prevent further header cleanup work from being achieved. If we just include in the and to solve header inclusion order issue, it will easily form a wrong inclusion order, and a build test would end with the following error messages: :1220:2: warning: #warning syscall kcmp not implemented [-Wcpp] :1223:2: warning: #warning syscall finit_module not implemented [-Wcpp] In file included from include/linux/acpi.h:38:0, from include/acpi/acpi_bus.h:30, from drivers/acpi/ac.c:35: include/acpi/acpi_drivers.h:101:43: warning: 'struct acpi_pci_root' declared inside parameter list [enabled by default] include/acpi/acpi_drivers.h:101:43: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] In file included from include/acpi/acpi_bus.h:30:0, from drivers/acpi/ac.c:35: include/linux/acpi.h:272:7: warning: 'struct acpi_device' declared inside parameter list [enabled by default] So this patch cleans up direct and inclusion by replacing them with inclusion and removing and inclusion from the so that the inclusion order is: Figure 5. ACPI Headers Inclusion Order CONFIG_ACPI=n This matches best known ACPI header inclusion order in the Figure 4. Note that, in some files that already include , this patch deletes and for them. If inclusion is after original and inclusions, it is moved up so that other ACPI header files are able to find the definitions of the ACPICA types. Currently is used to put prototypes that are referencing ACPICA defined types. As such types are not safe for "#ifndef CONFIG_ACPI" code blocks, includes this file with "#ifdef CONFIG_ACPI" protections. Directly including should only apply to ACPI dependent code while such code can also include . This patch tries to replace inclusions with inclusions or delete inclusions if there is already included for such files. When deleting, we carefully put inclusions before any other ACPI header file inclusions so that they are able to find ACPICA defined types. The actual cleanups are done in the following style: 1. CONFIG_ACPI dependent modules, where is already included. We could simply cleanup them by removing and inclusion. 2. Not CONFIG_ACPI dependent modules, where is already included. As there isn't code pieces surrounded by "#ifdef CONFIG_ACPI" we can simply cleanup them by removing and . 3. CONFIG_ACPI dependent modules, where is not included. We can simply cleanup them by replacing and with . 4. CONFIG_ACPI enabled code, but only ACPICA types are referenced, we can simply cleanup them by replacing and with . We carefully protect such inclusions with "#ifdef CONFIG_ACPI" undeleted. Tests are done by enabling the modules listed in this patch, build the kernel with and without this patch applied. The build output with static checkers enabled shows only line number differences. Signed-off-by: Lv Zheng --- arch/ia64/hp/common/aml_nfw.c|3 +-- arch/x86/kernel/apic/apic_flat_64.c |4 +--- arch/x86/kernel/apic/io_apic.c |3 --- arch/x86/platform/olpc/olpc-xo15-sci.c |3 +-- drivers/acpi/ac.c
[PATCH 1/4] ACPICA: Linux: Cleanup wrong ACPICA inclusions.
From: Lv Zheng The header file is the top level header for other kernel subsystems to include. It contains environemnt settings and architecture specific implementation that should be included before including other ACPICA headers in order to keep a consistent build environment for ACPICA users. All following internal ACPICA header files thus should be included from and not included by other kernel files: : included by : included by : included by : included by : included by This patch cleans up inclusions of above files from non ACPICA source files. Signed-off-by: Lv Zheng --- drivers/acpi/nvs.c |1 - drivers/hwmon/asus_atk0110.c |1 - 2 files changed, 2 deletions(-) diff --git a/drivers/acpi/nvs.c b/drivers/acpi/nvs.c index 266bc58..386a9fe 100644 --- a/drivers/acpi/nvs.c +++ b/drivers/acpi/nvs.c @@ -13,7 +13,6 @@ #include #include #include -#include /* ACPI NVS regions, APEI may use it */ diff --git a/drivers/hwmon/asus_atk0110.c b/drivers/hwmon/asus_atk0110.c index 1d7ff46..dafc63c 100644 --- a/drivers/hwmon/asus_atk0110.c +++ b/drivers/hwmon/asus_atk0110.c @@ -18,7 +18,6 @@ #include #include -#include #include #include -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] ACPI: Cleanup header inclusions.
From: Lv Zheng ACPICA header files are lacking in stub protections. This makes all ACPICA header files could only be included for CONFIG_ACPI=y environment. This infects new ACPI prototypes, if such prototypes referenced ACPICA defined types, they must be incldued only in the CONFIG_ACPI=y environment. The and are the monstrosity infants born under this background, directly including them in the CONFIG_ACPI=n environment is not safe. The infection spreads to whole kernel source tree that we can see "#ifdef CONFIG_ACPI" here and there. This patchset tries to enforces inclusions instead of , and inclusions. Lv Zheng (4): ACPICA: Linux: Cleanup wrong ACPICA inclusions. ACPI: Cleanup and inclusions. ACPICA: Cleanup inclusions. ACPI: Add support to force header inclusion rules for . arch/ia64/hp/common/aml_nfw.c|3 +-- arch/x86/kernel/apic/apic_flat_64.c |4 +--- arch/x86/kernel/apic/io_apic.c |3 --- arch/x86/pci/mmconfig_32.c |1 - arch/x86/platform/olpc/olpc-xo15-sci.c |3 +-- drivers/acpi/ac.c|3 +-- drivers/acpi/acpi_extlog.c |1 - drivers/acpi/acpi_pad.c |3 +-- drivers/acpi/acpica/Makefile |2 +- drivers/acpi/apei/einj.c |1 - drivers/acpi/battery.c |3 +-- drivers/acpi/blacklist.c |1 - drivers/acpi/bus.c |2 -- drivers/acpi/button.c|3 +-- drivers/acpi/custom_method.c |2 +- drivers/acpi/debugfs.c |2 +- drivers/acpi/dock.c |2 -- drivers/acpi/ec.c|3 +-- drivers/acpi/event.c |2 +- drivers/acpi/fan.c |3 +-- drivers/acpi/hed.c |2 -- drivers/acpi/numa.c |1 - drivers/acpi/nvs.c |1 - drivers/acpi/osl.c |2 -- drivers/acpi/pci_irq.c |2 -- drivers/acpi/pci_link.c |3 +-- drivers/acpi/pci_root.c |2 -- drivers/acpi/power.c |3 +-- drivers/acpi/proc.c |3 +-- drivers/acpi/processor_core.c|2 +- drivers/acpi/processor_idle.c|1 - drivers/acpi/processor_perflib.c |3 +-- drivers/acpi/processor_thermal.c |3 +-- drivers/acpi/processor_throttling.c |3 +-- drivers/acpi/sbshc.c |3 +-- drivers/acpi/scan.c |1 - drivers/acpi/sleep.c |2 -- drivers/acpi/sysfs.c |2 +- drivers/acpi/thermal.c |3 +-- drivers/acpi/utils.c |2 -- drivers/acpi/video.c |3 +-- drivers/acpi/wakeup.c|1 - drivers/ata/libata-acpi.c|1 - drivers/ata/pata_acpi.c |2 +- drivers/char/hpet.c |1 - drivers/char/tpm/tpm_acpi.c |2 +- drivers/char/tpm/tpm_ppi.c |1 - drivers/gpu/drm/i915/intel_acpi.c|1 - drivers/gpu/drm/nouveau/nouveau_acpi.c |3 --- drivers/gpu/drm/radeon/radeon_acpi.c |2 -- drivers/hv/vmbus_drv.c |1 - drivers/hwmon/acpi_power_meter.c |3 +-- drivers/hwmon/asus_atk0110.c |5 + drivers/ide/ide-acpi.c |1 - drivers/input/misc/atlas_btns.c |2 +- drivers/iommu/amd_iommu_init.c |1 - drivers/iommu/intel_irq_remapping.c |2 +- drivers/pci/hotplug/acpiphp_ibm.c|1 - drivers/pci/hotplug/pciehp.h |2 -- drivers/pci/ioapic.c |1 - drivers/pci/pci-acpi.c |2 -- drivers/pci/pci-label.c |1 - drivers/platform/x86/acer-wmi.c |1 - drivers/platform/x86/asus-laptop.c |3 +-- drivers/platform/x86/asus-wmi.c |3 +--
Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much
On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara wrote: > On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote: > > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote: > > > On Fri, 8 Nov 2013 00:21:51 +0100 > > > Frederic Weisbecker wrote: > > > > > > > > Offloading to a workqueue would be perhaps better, and writing to the > > > > serial > > > > console could then be done with interrupts enabled, preemptible > > > > context, etc... > > > > > > Oh God no ;-) Adding workqueue logic into printk just spells a > > > nightmare of much more complexity for a critical kernel infrastructure. > > > > But yeah that's scary, that means workqueues itself can't printk that > > safely. > > So, you're right after all. > Yeah, we've been there (that was actually my initial proposal). But > Andrew and Steven (rightfully) objected and suggested irq_work should be > used instead. I still hate the patchset and so does everyone else, including you ;) There must be something smarter we can do. Let's start by restating the problem: CPU A is in printk, emitting log_buf characters to a slow device. Meanwhile other CPUs come into printk(), see that the system is busy, dump their load into log_buf then scram, leaving CPU A to do even more work. Correct so far? If so, what is the role of local_irq_disabled() in this? Did CPU A call printk() with local interrupts disabled, or is printk (or the console driver) causing the irqs-off condition? Where and why is this IRQ disablement happening? Could we fix this problem by not permitting CPUs B, C and D to DoS CPU A? When CPU B comes into printk() and sees that printk is busy, make CPU A hand over to CPU B and let CPU A get out of there? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] double free in decompressor.c
On 22/11/13 21:50, Geyslan Gregório Bem wrote: Coverity caught double free possibility (CID 1130962). I can patch this, but I have to know if is correct to free comp_opts in the function squashfs_decompressor_create() or it had to be done in the caller. My bet is the caller. 128void *squashfs_decompressor_setup(struct super_block *sb, unsigned short flags) 129{ 130struct squashfs_sb_info *msblk = sb->s_fs_info; 131void *stream, *comp_opts = get_comp_opts(sb, flags); 132 1. Condition "IS_ERR(comp_opts)", taking false branch 133if (IS_ERR(comp_opts)) 134return comp_opts; 135 2. freed_arg: "squashfs_decompressor_create(struct squashfs_sb_info *, void *)" frees "comp_opts".[show details] 136stream = squashfs_decompressor_create(msblk, comp_opts); 3. Condition "IS_ERR(stream)", taking true branch 137if (IS_ERR(stream)) FALSE positive. squashfs_decompressor_create() frees comp_opts only on success. If IS_ERR(stream) is true, then comp_opts has not been freed by squashfs_decompressor_create(). Phillip CID 1130962 (#1 of 1): Double free (USE_AFTER_FREE)4. double_free: Calling "kfree(void const *)" frees pointer "comp_opts" which has already been freed. 138kfree(comp_opts); 139 140return stream; 141} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] e1000: fix possible reset_task running after adapter down
On Fri, 2013-11-22 at 12:20 +0400, Vladimir Davydov wrote: > On e1000_down(), we should ensure every asynchronous work is canceled > before proceeding. Since the watchdog_task can schedule other works > apart from itself, it should be stopped first, but currently it is > stopped after the reset_task. This can result in the following race > leading to the reset_task running after the module unload: > > e1000_down_and_stop(): e1000_watchdog(): > -- - > > cancel_work_sync(reset_task) > schedule_work(reset_task) > cancel_delayed_work_sync(watchdog_task) > > The patch moves cancel_delayed_work_sync(watchdog_task) at the > beginning > of e1000_down_and_stop() thus ensuring the race is impossible. > > Signed-off-by: Vladimir Davydov > Cc: Tushar Dave > Cc: Patrick McHardy > Cc: David S. Miller > --- > drivers/net/ethernet/intel/e1000/e1000_main.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) I have applied your patch to my queue, thanks! signature.asc Description: This is a digitally signed message part
Re: [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
On Thu, Nov 14, 2013 at 6:56 AM, Oleg Nesterov wrote: > Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed() > in create_worker(). Otherwise userland can change ->cpus_allowed > in between. > > Signed-off-by: Oleg Nesterov Applied to wq/for-3.13-fixes. Thanks! -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] e1000: fix lockdep warning in e1000_reset_task
On Fri, 2013-11-22 at 12:20 +0400, Vladimir Davydov wrote: > The patch fixes the following lockdep warning, which is 100% > reproducible on network restart: > > == > [ INFO: possible circular locking dependency detected ] > 3.12.0+ #47 Tainted: GF > --- > kworker/1:1/27 is trying to acquire lock: > ((&(>watchdog_task)->work)){+.+...}, at: > [] flush_work+0x0/0x70 > > but task is already holding lock: > (>mutex){+.+...}, at: [] e1000_reset_task > +0x4a/0xa0 [e1000] > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (>mutex){+.+...}: >[] lock_acquire+0x9d/0x120 >[] mutex_lock_nested+0x4c/0x390 >[] e1000_watchdog+0x7d/0x5b0 [e1000] >[] process_one_work+0x1d2/0x510 >[] worker_thread+0x120/0x3a0 >[] kthread+0xee/0x110 >[] ret_from_fork+0x7c/0xb0 > > -> #0 ((&(>watchdog_task)->work)){+.+...}: >[] __lock_acquire+0x1710/0x1810 >[] lock_acquire+0x9d/0x120 >[] flush_work+0x3b/0x70 >[] __cancel_work_timer+0x98/0x140 >[] cancel_delayed_work_sync+0x13/0x20 >[] e1000_down_and_stop+0x3c/0x60 [e1000] >[] e1000_down+0x131/0x220 [e1000] >[] e1000_reset_task+0x52/0xa0 [e1000] >[] process_one_work+0x1d2/0x510 >[] worker_thread+0x120/0x3a0 >[] kthread+0xee/0x110 >[] ret_from_fork+0x7c/0xb0 > > other info that might help us debug this: > > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(>mutex); > > lock((&(>watchdog_task)->work)); >lock(>mutex); > lock((&(>watchdog_task)->work)); > > *** DEADLOCK *** > > 3 locks held by kworker/1:1/27: > #0: (events){.+.+.+}, at: [] process_one_work > +0x166/0x510 > #1: ((>reset_task)){+.+...}, at: [] > process_one_work+0x166/0x510 > #2: (>mutex){+.+...}, at: [] > e1000_reset_task+0x4a/0xa0 [e1000] > > stack backtrace: > CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF3.12.0+ #47 > Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS > 050105/31/2007 > Workqueue: events e1000_reset_task [e1000] > 820f6000 88007b9dba98 816b54a2 0002 > 820f5e50 88007b9dbae8 810ba936 88007b9dbac8 > 88007b9dbb48 88007b9d8f00 88007b9d8780 88007b9d8f00 > Call Trace: > [] dump_stack+0x49/0x5f > [] print_circular_bug+0x216/0x310 > [] __lock_acquire+0x1710/0x1810 > [] ? __flush_work+0x250/0x250 > [] lock_acquire+0x9d/0x120 > [] ? __flush_work+0x250/0x250 > [] flush_work+0x3b/0x70 > [] ? __flush_work+0x250/0x250 > [] __cancel_work_timer+0x98/0x140 > [] cancel_delayed_work_sync+0x13/0x20 > [] e1000_down_and_stop+0x3c/0x60 [e1000] > [] e1000_down+0x131/0x220 [e1000] > [] e1000_reset_task+0x52/0xa0 [e1000] > [] process_one_work+0x1d2/0x510 > [] ? process_one_work+0x166/0x510 > [] worker_thread+0x120/0x3a0 > [] ? manage_workers+0x2c0/0x2c0 > [] kthread+0xee/0x110 > [] ? __init_kthread_worker+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? __init_kthread_worker+0x70/0x70 > > == The issue background == > > The problem occurs, because e1000_down(), which is called under > adapter->mutex by e1000_reset_task(), tries to synchronously cancel > e1000 auxiliary works (reset_task, watchdog_task, phy_info_task, > fifo_stall_task), which take adapter->mutex in their handlers. So the > question is what does adapter->mutex protect there? > > The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to > private mutex from rtnl") as a replacement for rtnl_lock() taken in > the > asynchronous handlers. It targeted on fixing a similar lockdep warning > issued when e1000_down() was called under rtnl_lock(), and it fixed > it, > but unfortunately it introduced the lockdep warning described above. > Anyway, that said the source of this bug is that the asynchronous > works > were made to take rtnl_lock() some time ago, so let's look deeper and > find why it was added there. > > The rtnl_lock() was added to asynchronous handlers by commit 338c15 > ("e1000: fix occasional panic on unload") in order to prevent > asynchronous handlers from execution after the module is unloaded > (e1000_down() is called) as it follows from the comment to the commit: > > > Net drivers in general have an issue where timers fired > > by mod_timer or work threads with schedule_work are running > > outside of the rtnl_lock. > > > > With no other lock protection these routines are vulnerable > > to races with driver unload or reset paths. > > > > The longer term solution to this might be a redesign with > > safer locks being taken in the driver to guarantee no > > reentrance, but for now a safe and effective fix is > > to take the rtnl_lock in these routines. > > I'm not sure if this
Re: [PATCH] ACPI / scan: User direct recurrence for device hierarchy walks
On Saturday, November 23, 2013 12:22:23 AM Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > Rework acpi_bus_trim() and acpi_bus_device_attach(), which is > renamed as acpi_bus_attach(), to walk the list of each device > object's children directly and call themselves recursively for > each child instead of using acpi_walk_namespace(). This > simplifies the code quite a bit and avoids the overhead of > callbacks and the ACPICA's internal processing which are not > really necessary for these two routines. > > Signed-off-by: Rafael J. Wysocki > --- > > Hi, > > This is on top of the linux-pm/bleeding-edge branch that contains the series > I posted previously: http://marc.info/?l=linux-pci=138470560909690=4 The subject should be "ACPI / scan: Use direct recurrence for device hierarchy walks", sorry for the typo. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI / scan: User direct recurrence for device hierarchy walks
From: Rafael J. Wysocki Rework acpi_bus_trim() and acpi_bus_device_attach(), which is renamed as acpi_bus_attach(), to walk the list of each device object's children directly and call themselves recursively for each child instead of using acpi_walk_namespace(). This simplifies the code quite a bit and avoids the overhead of callbacks and the ACPICA's internal processing which are not really necessary for these two routines. Signed-off-by: Rafael J. Wysocki --- Hi, This is on top of the linux-pm/bleeding-edge branch that contains the series I posted previously: http://marc.info/?l=linux-pci=138470560909690=4 Thanks, Rafael --- drivers/acpi/scan.c | 120 +++- 1 file changed, 45 insertions(+), 75 deletions(-) Index: linux-pm/drivers/acpi/scan.c === --- linux-pm.orig/drivers/acpi/scan.c +++ linux-pm/drivers/acpi/scan.c @@ -1909,54 +1909,40 @@ static int acpi_scan_attach_handler(stru return ret; } -static acpi_status acpi_bus_device_attach(acpi_handle handle, u32 lvl_not_used, - void *not_used, void **ret_not_used) +static void acpi_bus_attach(struct acpi_device *device) { - struct acpi_device *device; - unsigned long long sta; + struct acpi_device *child; int ret; - /* -* Ignore errors ignored by acpi_bus_check_add() to avoid terminating -* namespace walks prematurely. -*/ - if (acpi_bus_type_and_status(handle, , )) - return AE_OK; - - if (acpi_bus_get_device(handle, )) - return AE_CTRL_DEPTH; - - acpi_set_device_status(device, sta); + acpi_bus_get_status(device); /* Skip devices that are not present. */ - if (!acpi_device_is_present(device)) - goto err; - + if (!acpi_device_is_present(device)) { + device->flags.visited = false; + return; + } if (device->handler) - return AE_OK; + goto ok; if (!device->flags.initialized) { acpi_bus_update_power(device, NULL); device->flags.initialized = true; } + device->flags.visited = false; ret = acpi_scan_attach_handler(device); if (ret < 0) - goto err; + return; device->flags.match_driver = true; - if (ret > 0) - goto ok; - - ret = device_attach(>dev); - if (ret < 0) - goto err; - - ok: + if (!ret) { + ret = device_attach(>dev); + if (ret < 0) + return; + } device->flags.visited = true; - return AE_OK; - err: - device->flags.visited = false; - return AE_CTRL_DEPTH; + ok: + list_for_each_entry(child, >children, node) + acpi_bus_attach(child); } /** @@ -1976,64 +1962,48 @@ static acpi_status acpi_bus_device_attac int acpi_bus_scan(acpi_handle handle) { void *device = NULL; - int error = 0; if (ACPI_SUCCESS(acpi_bus_check_add(handle, 0, NULL, ))) acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX, acpi_bus_check_add, NULL, NULL, ); - if (!device) - error = -ENODEV; - else if (ACPI_SUCCESS(acpi_bus_device_attach(handle, 0, NULL, NULL))) - acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX, - acpi_bus_device_attach, NULL, NULL, NULL); - - return error; -} -EXPORT_SYMBOL(acpi_bus_scan); - -static acpi_status acpi_bus_device_detach(acpi_handle handle, u32 lvl_not_used, - void *not_used, void **ret_not_used) -{ - struct acpi_device *device = NULL; - - if (!acpi_bus_get_device(handle, )) { - struct acpi_scan_handler *dev_handler = device->handler; - - if (dev_handler) { - if (dev_handler->detach) - dev_handler->detach(device); - - device->handler = NULL; - } else { - device_release_driver(>dev); - } - /* -* Most likely, the device is going away, so put it into D3cold -* before that. -*/ - acpi_device_set_power(device, ACPI_STATE_D3_COLD); - device->flags.initialized = false; - device->flags.visited = false; + if (device) { + acpi_bus_attach(device); + return 0; } - return AE_OK; + return -ENODEV; } +EXPORT_SYMBOL(acpi_bus_scan); /** - * acpi_bus_trim - Remove ACPI device node and all of its descendants - * @start: Root of the ACPI device nodes subtree to remove. + * acpi_bus_trim - Detach scan
Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on
On Fri, Nov 22, 2013 at 03:28:07PM -0600, Alex Thorlton wrote: > > If the warning added by that patch does *not* trigger than can you also > > test this patch? It removes the barriers which should not be necessary > > and takes a reference tot he page before waiting on the lock. The > > previous version did not take the reference because otherwise the > > WARN_ON could not distinguish between a migration waiter and a surprise > > gup. > > Sorry for the delay; been a bit busy. I tested both of these patches on > top of this one (separately, of course): > > http://www.spinics.net/lists/linux-mm/msg63919.html > > I think that's the one you were referring to, if not send me a pointer > to the correct one and I'll give it another shot. Both patches still > segfaulted, so it doesn't appear that either of these solved the > problem. I see. Does THP have to be enabled or does it segfault even with THP disabled? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] e1000: prevent oops when adapter is being closed and reset simultaneously
On Thu, Nov 21, 2013 at 11:54 PM, yzhu1 wrote: > Hi, maintainers > > This change is based on a similar change made to e1000e support in > commit bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed > and reset simultaneously"). The same issue has also been observed > on the older e1000 cards. > > Here, we have increased the RESET_COUNT value to 50 because there are too > many accesses to e1000 nic on stress tests to e1000 nic, it is not enough > to set RESET_COUT 25. Experimentation has shown that it is enough to set > RESET_COUNT 50. > > Please help me to merge this patch. Thanks a lot. > > Zhu Yanjun Please remember to actually include the driver maintainer for the driver your patch is against (i.e. me). I will add your patch to my queue. -- Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] percpu: stop the loop when a cpu belongs to a new group
Hello, On Wed, Nov 20, 2013 at 12:51:21AM -0500, Tejun Heo wrote: > The patch is just extremely marginal. Ah well... why not? I'll apply > it once -rc1 drops. So, I was about to apply this patch but decided against it. It doesn't really make anything better and the code looks worse afterwards. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] mm, memory-failure: fix the typo in me_pagecache_dirty()
On Fri, 8 Nov 2013 20:47:38 +0800 Zhi Yong Wu wrote: > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -611,7 +611,7 @@ static int me_pagecache_clean(struct page *p, unsigned > long pfn) > } > > /* > - * Dirty cache page page > + * Dirty cache page > * Issues: when the error hit a hole page the error is not properly > * propagated. > */ The accurate and complete description of this page is actually "pagecache page", so... --- a/mm/memory-failure.c~mm-memory-failure-fix-the-typo-in-me_pagecache_dirty-fix +++ a/mm/memory-failure.c @@ -611,7 +611,7 @@ static int me_pagecache_clean(struct pag } /* - * Dirty cache page + * Dirty pagecache page * Issues: when the error hit a hole page the error is not properly * propagated. */ _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] Input updates for 3.13-rc0 (2nd)
Hi Linus, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus or master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus to receive 2nd round of updates for the input subsystem. You will get a new driver for Surface 2.0/Pixelsense touchscreen and a couple of driver fixups. Changelog: - Dan Carpenter (1): Input: hp_sdc_rtc - unlock on error in hp_sdc_rtc_read_i8042timer() Dmitry Torokhov (1): Input: atmel-wm97xx - fix compile error Florian Echtler (1): Input: add sur40 driver for Samsung SUR40 (aka MS Surface 2.0/Pixelsense) Haiyang Zhang (1): MAINTAINERS - add keyboard driver to Hyper-V file list Xie XiuQi (1): Input: cyttsp4 - remove unnecessary work pending test Diffstat: drivers/input/misc/hp_sdc_rtc.c | 5 +- drivers/input/touchscreen/Kconfig| 11 + drivers/input/touchscreen/Makefile | 1 + drivers/input/touchscreen/atmel-wm97xx.c | 2 +- drivers/input/touchscreen/cyttsp4_core.c | 3 +- drivers/input/touchscreen/sur40.c| 466 +++ 6 files changed, 484 insertions(+), 4 deletions(-) create mode 100644 drivers/input/touchscreen/sur40.c -- Dmitry pgpVpvWDtA7dJ.pgp Description: PGP signature
[PATCH v5 2/2] ARM: bcm281xx: watchdog configuration
This commit enables the watchdog driver for the BCM281xx family of SoCs. Signed-off-by: Markus Mayer Reviewed-by: Matt Porter Acked-by: Guenter Roeck --- arch/arm/configs/bcm_defconfig |3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/configs/bcm_defconfig b/arch/arm/configs/bcm_defconfig index 287ac1d..d4d0083 100644 --- a/arch/arm/configs/bcm_defconfig +++ b/arch/arm/configs/bcm_defconfig @@ -126,3 +126,6 @@ CONFIG_CRC_ITU_T=y CONFIG_CRC7=y CONFIG_XZ_DEC=y CONFIG_AVERAGE=y +CONFIG_WATCHDOG=y +CONFIG_BCM_KONA_WDT=y +CONFIG_BCM_KONA_WDT_DEBUG=y -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 0/2] watchdog: bcm281xx: Watchdog Driver
This is version 5 of the watchdog driver for the BCM281xx family of mobile SoCs. Changes since version 4: - Fixed whitespace issue in Kconfig "help" section - Rebased onto 3.13-rc1 (due to merge conflict in Makefile) Changes since version 3: - Removed #include - Added Reviewed-by:/Acked-by: Guenter Roeck Changes since version 2: - BCM_KONA_WDT_NAME: "bcm-kona-wdt" -> "bcm_kona_wdt" - Don't initialize static variable - Removed '{'/'}' from single-line statements - busy_count retains the maximum loop count of the longest loop rather than an over-all maximum - Use seq_puts() in lieu of seq_printf() in 2 locations - Removed error message after devm_kzalloc() - Added MODULE_ALIAS Changes since version 1: - Added module name to "help" section in Kconfig A few cosmetic code - simplifications and fixes Removed most dev_info() calls and changed the - remaining ones to dev_dbg() Renamed SECWDOG_WD_LOAD_FLAG_MASK to - SECWDOG_WD_LOAD_FLAG Added some comments to secure_register_read() and - struct bcm_kona_wdt Added delay to secure_register_read() Reduced maximum - retry loop from 1 to 1000 Introduced "busy_count" variable to count - how often secure_register_read() gets stalled; this is available through - debugfs Simplified secure_register_read() to return -ETIMEDOUT rather than - using a variable parameter to indicate a timeout error Got rid of all uses - of -EAGAIN Fixed return value check for debugfs_create_dir() Simplified - bcm_kona_wdt_debugfs_init() by getting rid of goto Created new generic - function bcm_kona_wdt_ctrl_reg_modify() The following functions now use - bcm_kona_wdt_ctrl_reg_modify(): - bcm_kona_wdt_set_resolution_reg() bcm_kona_wdt_set_timeout_reg() - bcm_kona_wdt_stop() - Made bcm_kona_wdt_set_timeout_reg() more generic, so bcm_kona_wdt_start() can use it Removed MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR) Markus Mayer (2): watchdog: bcm281xx: Watchdog Driver ARM: bcm281xx: watchdog configuration arch/arm/configs/bcm_defconfig |3 + drivers/watchdog/Kconfig| 22 +++ drivers/watchdog/Makefile |1 + drivers/watchdog/bcm_kona_wdt.c | 365 +++ 4 files changed, 391 insertions(+) create mode 100644 drivers/watchdog/bcm_kona_wdt.c -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 1/2] watchdog: bcm281xx: Watchdog Driver
This commit adds support for the watchdog timer used on the BCM281xx family of SoCs. Signed-off-by: Markus Mayer Reviewed-by: Matt Porter Reviewed-by: Guenter Roeck --- drivers/watchdog/Kconfig| 22 +++ drivers/watchdog/Makefile |1 + drivers/watchdog/bcm_kona_wdt.c | 365 +++ 3 files changed, 388 insertions(+) create mode 100644 drivers/watchdog/bcm_kona_wdt.c diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 5be6e91..24bc93a 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -1139,6 +1139,28 @@ config BCM2835_WDT To compile this driver as a loadable module, choose M here. The module will be called bcm2835_wdt. +config BCM_KONA_WDT + tristate "BCM Kona Watchdog" + depends on ARCH_BCM + select WATCHDOG_CORE + help + Support for the watchdog timer on the following Broadcom BCM281xx + family, which includes BCM11130, BCM11140, BCM11351, BCM28145 and + BCM28155 variants. + + Say 'Y' or 'M' here to enable the driver. The module will be called + bcm_kona_wdt. + +config BCM_KONA_WDT_DEBUG + bool "DEBUGFS support for BCM Kona Watchdog" + depends on BCM_KONA_WDT + help + If enabled, adds /sys/kernel/debug/bcm-kona-wdt/info which provides + access to the driver's internal data structures as well as watchdog + timer hardware registres. + + If in doubt, say 'N'. + config LANTIQ_WDT tristate "Lantiq SoC watchdog" depends on LANTIQ diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile index 91bd95a..af22516 100644 --- a/drivers/watchdog/Makefile +++ b/drivers/watchdog/Makefile @@ -57,6 +57,7 @@ obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o obj-$(CONFIG_MOXART_WDT) += moxart_wdt.o obj-$(CONFIG_SIRFSOC_WATCHDOG) += sirfsoc_wdt.o +obj-$(CONFIG_BCM_KONA_WDT) += bcm_kona_wdt.o # AVR32 Architecture obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o diff --git a/drivers/watchdog/bcm_kona_wdt.c b/drivers/watchdog/bcm_kona_wdt.c new file mode 100644 index 000..9fe174b --- /dev/null +++ b/drivers/watchdog/bcm_kona_wdt.c @@ -0,0 +1,365 @@ +/* + * Copyright (C) 2013 Broadcom Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation version 2. + * + * This program is distributed "as is" WITHOUT ANY WARRANTY of any + * kind, whether express or implied; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define SECWDOG_CTRL_REG 0x +#define SECWDOG_COUNT_REG 0x0004 + +#define SECWDOG_RESERVED_MASK 0x1dff +#define SECWDOG_WD_LOAD_FLAG 0x1000 +#define SECWDOG_EN_MASK0x0800 +#define SECWDOG_SRSTEN_MASK0x0400 +#define SECWDOG_RES_MASK 0x00f0 +#define SECWDOG_COUNT_MASK 0x000f + +#define SECWDOG_MAX_COUNT SECWDOG_COUNT_MASK +#define SECWDOG_CLKS_SHIFT 20 +#define SECWDOG_MAX_RES15 +#define SECWDOG_DEFAULT_RESOLUTION 4 +#define SECWDOG_MAX_TRY1000 + +#define SECS_TO_TICKS(x, w)((x) << (w)->resolution) +#define TICKS_TO_SECS(x, w)((x) >> (w)->resolution) + +#define BCM_KONA_WDT_NAME "bcm_kona_wdt" + +struct bcm_kona_wdt { + void __iomem *base; + /* +* One watchdog tick is 1/(2^resolution) seconds. Resolution can take +* the values 0-15, meaning one tick can be 1s to 30.52us. Our default +* resolution of 4 means one tick is 62.5ms. +* +* The watchdog counter is 20 bits. Depending on resolution, the maximum +* counter value of 0xf expires after about 12 days (resolution 0) +* down to only 32s (resolution 15). The default resolution of 4 gives +* us a maximum of about 18 hours and 12 minutes before the watchdog +* times out. +*/ + int resolution; + spinlock_t lock; +#ifdef CONFIG_BCM_KONA_WDT_DEBUG + struct dentry *debugfs; +#endif +}; + +#ifdef CONFIG_BCM_KONA_WDT_DEBUG +static unsigned long busy_count; +#endif + +static int secure_register_read(void __iomem *addr) +{ + uint32_t val; + unsigned count = 0; + + /* +* If the WD_LOAD_FLAG is set, the watchdog counter field is being +* updated in hardware. Once the WD timer is updated in hardware, it +* gets cleared. +*/ + do { + if (unlikely(count > 1)) + udelay(5); + val =
Re: 3.10.16 cgroup_mutex deadlock
Hi Tejun, On Fri, Nov 22, 2013 at 11:18 PM, Tejun Heo wrote: > Just applied to cgroup/for-3.13-fixes w/ stable cc'd. Will push to > Linus next week. Thank your for your quick reply. Do you also have a backport for v3.10.x already available? Best regards, -- William -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] fs: sysfs: Add devres support
On Fri, Nov 22, 2013 at 02:47:38PM -0800, Dmitry Torokhov wrote: > On Sat, Mar 16, 2013 at 09:21:40AM -0700, Greg Kroah-Hartman wrote: > > On Thu, Mar 14, 2013 at 08:24:45PM -0700, Guenter Roeck wrote: > > > Provide devres functions for device_create_file, sysfs_create_file, > > > and sysfs_create_group plus the respective remove functions. > > > > > > Idea is to be able to drop calls to the remove functions from the various > > > drivers using those calls. > > > > Hm, despite the fact that almost every driver that makes these calls is > > broken? :) > > > > > Potential savings are substantial. There are more than 700 calls to > > > device_remove_file in the kernel, more than 500 calls to > > > sysfs_remove_group, > > > and some 50 calls to sysfs_remove_file (though not all of those use > > > dev->kobj > > > as parameter). Expanding the API to sysfs_create_bin_file would add > > > another 80+ > > > opportunities, and adding sysfs_create_link would create another 100 or > > > so. > > > > The idea is nice, but why are these drivers adding sysfs files on their > > own? Are they doing this in a way that is race-free with userspace > > (i.e. creating them before userspace is told about the device), or are > > they broken and need to have these calls added to the "default > > device/driver/bus" attribute list for them instead? > > Just stumbled upon this thread... > > There is a need for drivers to add driver-specific attributes to a > device. Since they are driver specific they can not go into bus or class > or whatever default attributes that are created when device is > instantiated, but rather attached to the device when a driver binds to > them. An example would be a PS/2 mouse driver allowing user to control > report rate and resolution of the device. Since it only applicable to > PS/2 mice the knob does not belong to the generic serio layer/bus nor > should it go into input layer as it is again PS/2 specific. So psmouse > creates it while binding to a serio port. Yeah, platform drivers also "need" this type of thing as well, so you are right, I can't forbit it for everyone. > Do we send a uevent when driver binds/unbinds from a device? No, we don't, see kobject_actions[] for what we implement. > If not I think we should so that userspace can check for additional > attributes, if any. We might be able to do a "change" event for the device itself after it has been bound / unbound, but I don't know what userspace will do with that information at this point in time (i.e. 10+ years without that information...) thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net: sctp: set chunk->tsn_gap_acked at the end of cycle
On 11/22/2013 02:24 PM, Chang wrote: > > On 11/22/2013 03:27 PM, Vlad Yasevich wrote: >> On 11/22/2013 02:49 AM, Chang Xiangzhong wrote: >>> tsn_gap_acked is an important state flag in chunk, which indicates if >>> the >>> chunk has been acked in gap reports before. >> Actually, this bit indicates simply that the chunk has been acked. It >> doesn't state whether it's been acked in a gap report or via >> cumulative tsn. > Thanks for pointing this out. Sorry for not having made that clear. >> >>> SFR-CACC algorithm depends on this >>> variable. So set this at the end of each iteration, otherwise the >>> SFR-CACC >>> algorithm would never be toggled. >>> >>> Signed-off-by: Chang Xiangzhong >>> --- >>> net/sctp/outqueue.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c >>> index 1b494fa..bff828c 100644 >>> --- a/net/sctp/outqueue.c >>> +++ b/net/sctp/outqueue.c >>> @@ -1396,7 +1396,6 @@ static void sctp_check_transmitted(struct >>> sctp_outq *q, >>>* while DATA was outstanding). >>>*/ >>> if (!tchunk->tsn_gap_acked) { >>> -tchunk->tsn_gap_acked = 1; >>> if (TSN_lt(*highest_new_tsn_in_sack, tsn)) >>> *highest_new_tsn_in_sack = tsn; >>> bytes_acked += sctp_data_size(tchunk); >>> @@ -1460,6 +1459,8 @@ static void sctp_check_transmitted(struct >>> sctp_outq *q, >>>*/ >>> list_add_tail(lchunk, ); >>> } >>> + >>> +tchunk->tsn_gap_acked = 1; >> The current code will set the state if it hasn't been set yet. Why is >> this needed? > Because in line 1420 ~ 1440 The SFR-CACC algorithms use tsn_gap_acked. > That "if block" would never be triggered because the varaiable's been > set. That's why I move the "state changing sentence" to the end of the > iteration. >> Now there is an issue with tracking highest_new_tsn_in_sack. The spec >> is a little vague on this, but the highest_new_tsn_in_sack only supposed >> to track tsns that have not been resent. If a tsn has been reneged, and >> then sent again, it is not considered 'new' thus should not count >> toward highest_new_tsn_in_sack. We currently do not track this right. > I'll try to figure this out. So the solution that's been proposed before is to move the CACC block up and merge it with the !tsn_gap_acked block. This requires additional change though to only mark highest_new_tsn if the chunk has not been retransmitted (which you've added in a prior patch). -vlad >> -vlad >> >> -vlad >> >>> } else { >>> if (tchunk->tsn_gap_acked) { >>> pr_debug("%s: receiver reneged on data TSN:0x%x\n", >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch.pl: Check for functions without a real prototype
On Fri, Nov 22, 2013 at 5:45 PM, Joe Perches wrote: > On Fri, 2013-11-22 at 17:17 -0500, Elena Ufimtseva wrote: >> Based on Richard Weinberger patch https://lkml.org/lkml/2012/3/16/510 >> Functions like this one are evil: >> >> void foo() >> { >> ... >> } > [] >> - added white space in regular expression between (), based >> on comments from Joe Perches. > > Thanks Elena, but this is already applied in -next. > > No problem, thanks Joe ) -- Elena -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] fs: sysfs: Add devres support
On Sat, Mar 16, 2013 at 09:21:40AM -0700, Greg Kroah-Hartman wrote: > On Thu, Mar 14, 2013 at 08:24:45PM -0700, Guenter Roeck wrote: > > Provide devres functions for device_create_file, sysfs_create_file, > > and sysfs_create_group plus the respective remove functions. > > > > Idea is to be able to drop calls to the remove functions from the various > > drivers using those calls. > > Hm, despite the fact that almost every driver that makes these calls is > broken? :) > > > Potential savings are substantial. There are more than 700 calls to > > device_remove_file in the kernel, more than 500 calls to sysfs_remove_group, > > and some 50 calls to sysfs_remove_file (though not all of those use > > dev->kobj > > as parameter). Expanding the API to sysfs_create_bin_file would add another > > 80+ > > opportunities, and adding sysfs_create_link would create another 100 or so. > > The idea is nice, but why are these drivers adding sysfs files on their > own? Are they doing this in a way that is race-free with userspace > (i.e. creating them before userspace is told about the device), or are > they broken and need to have these calls added to the "default > device/driver/bus" attribute list for them instead? Just stumbled upon this thread... There is a need for drivers to add driver-specific attributes to a device. Since they are driver specific they can not go into bus or class or whatever default attributes that are created when device is instantiated, but rather attached to the device when a driver binds to them. An example would be a PS/2 mouse driver allowing user to control report rate and resolution of the device. Since it only applicable to PS/2 mice the knob does not belong to the generic serio layer/bus nor should it go into input layer as it is again PS/2 specific. So psmouse creates it while binding to a serio port. Do we send a uevent when driver binds/unbinds from a device? If not I think we should so that userspace can check for additional attributes, if any. Thanks. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] checkpatch.pl: Check for functions without a real prototype
On Fri, 2013-11-22 at 17:17 -0500, Elena Ufimtseva wrote: > Based on Richard Weinberger patch https://lkml.org/lkml/2012/3/16/510 > Functions like this one are evil: > > void foo() > { > ... > } [] > - added white space in regular expression between (), based > on comments from Joe Perches. Thanks Elena, but this is already applied in -next. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation
On Mon, 2013-11-04 at 16:52 +, Joseph S. Myers wrote: > From: Joseph Myers > > The e500 SPE floating-point emulation code clears existing exceptions > (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the > emulated operation. However, these exception bits are the "sticky", > cumulative exception bits, and should only be cleared by the user > program setting SPEFSCR, not implicitly by any floating-point > instruction (whether executed purely by the hardware or emulated). > The spurious clearing of these bits shows up as missing exceptions in > glibc testing. > > Fixing this, however, is not as simple as just not clearing the bits, > because while the bits may be from previous floating-point operations > (in which case they should not be cleared), the processor can also set > the sticky bits itself before the interrupt for an exception occurs, > and this can happen in cases when IEEE 754 semantics are that the > sticky bit should not be set. Specifically, the "invalid" sticky bit > is set in various cases with non-finite operands, where IEEE 754 > semantics do not involve raising such an exception, and the > "underflow" sticky bit is set in cases of exact underflow, whereas > IEEE 754 semantics are that this flag is set only for inexact > underflow. Thus, for correct emulation the kernel needs to know the > setting of these two sticky bits before the instruction being > emulated. > > When a floating-point operation raises an exception, the kernel can > note the state of the sticky bits immediately afterwards. Some > functions that affect the state of these bits, such as > fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and > PR_SET_FPEXC anyway, and so it is natural to record the state of those > bits during that call into the kernel and so avoid any need for a > separate call into the kernel to inform it of a change to those bits. > Thus, the interface I chose to use (in this patch and the glibc port) > is that one of those prctl calls must be made after any userspace > change to those sticky bits, other than through a floating-point > operation that traps into the kernel anyway. This sounds like an incompatible change to userspace API. What about older glibc? What about user code that directly manipulates these bits rather than going through libc, or uses a libc other than glibc? Where is this API requirement documented? I think the impact of this could be reduced by using this mechanism only to clear bits, rather than set them. That is, if the exception bit is unset, don't set it just because it's set in spefscr_last -- but if it's not set in spefscr_last, and the emulation code doesn't want to set it, then clear it. Are there any cases where the exception bit can be set without the kernel taking a trap, or is userspace manipulation limited to clearing the bits? -Scott -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/9 v3] kexec kernel efi runtime support
On Thu, 2013-11-21 at 14:17 +0800, dyo...@redhat.com wrote: > Hi, > > Here is the V3 for supporting kexec kernel efi runtime. > Per pervious discussion I pass the 1st kernel efi runtime mapping > via setup_data to 2nd kernel. Besides of the runtime mapping > info I also pass the fw_vendor, runtime, config table, smbios > physical address in setup_data. EFI spec mentioned fw_vendor, > runtime, config table addresses will be converted to virt address > after entering virtual mode, but we will use it as physical address > in efi_init. For smbios EFI spec did not mention about the address > updating, but during my test on a HP workstation, the bios will > convert it to Virt addr, thus pass it in setup_data as well. > > For fw_vendor, runtime, config table, I export them in /sys/firmware/ > efi/, smbios is already in /sys/firmware/efi/systab. > > For efi runtime mapping I add a new directory /sys/firmware/efi/ > runtime-map/ like below > [dave@darkstar ~]$ tree /sys/firmware/efi/runtime-map/ > /sys/firmware/efi/runtime-map/ > ├── 0 > │ ├── attribute > │ ├── num_pages > │ ├── phys_addr > │ ├── type > │ └── virt_addr > ├── 1 > │ ├── attribute > │ ├── num_pages > │ ├── phys_addr > │ ├── type > │ └── virt_addr > [snip] > > kexec-tools will assemble them as setup_data and pass to 2nd kernel. > I will send userspace patches as well. > > Limitation is I only write support for x86_64, test on below machines: > Lenovo thinkpad t420 > Dell inspiron 14 - 3421 > HP Z420 workstation > Qemu + OVMF Tested on an HP EFI-based 60-way server (prototype). For the series: Tested-by: Toshi Kani Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/34] idle: Implement generic idle function
On Mon, 18 Nov 2013, Viresh Kumar wrote: > On Fri, Mar 22, 2013 at 3:23 AM, Thomas Gleixner wrote: > > --- linux-2.6.orig/kernel/cpu/idle.c > > +++ linux-2.6/kernel/cpu/idle.c > > > +void cpu_startup_entry(enum cpuhp_state state) > > +{ > > + current_set_polling(); > > + arch_cpu_idle_prepare(); > > + cpu_idle_loop(); > > +} > > +#endif > > Hi Thomas.. > > Manish (cc'd) recently asked me purpose of _state_ argument in this > routine, As this is unused here.. > > Was this reserved for future use? Yep, if you dig the mail archives you'll find my cpu hotplug series which tries to convert it to a full state machine. So the state argument is due to that, because the boot cpu will call into that with a certain state and we want to allow archs to skip some states even on secondary cpus. So yes, it's unused, but its there on purpose. It's going to be used when my work backlog drops down to the cpuhotplug mess again. About 2 lightyears from now Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sysfs: handle duplicate removal attempts in sysfs_remove_group()
On Friday, November 22, 2013 08:43:55 AM Bjorn Helgaas wrote: > [+cc Rafael, James] > > On Tue, Nov 19, 2013 at 11:18 PM, Tejun Heo wrote: > > (cc'ing Bjorn) > > > > Hello, > > > > On Tue, Nov 19, 2013 at 03:09:58PM +0200, Mika Westerberg wrote: > >> Commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) changed > >> the behavior so that directory removals will be done recursively. This > >> means that the sysfs group might already be removed if its parent directory > >> has been removed. > >> > >> The current code outputs warnings similar to following log snippet when it > >> detects that there is no group for the given kobject: > >> > >> WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 > >> sysfs_remove_group+0xc6/0xd0() > >> sysfs group 81c6f1e0 not found for kobject 'host7' > >> Modules linked in: > >> CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13 > >> Hardware name: /D33217CK, BIOS > >> GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 > >> Workqueue: kacpi_hotplug acpi_hotplug_work_fn > >> 0009 8801002459b0 817daab1 8801002459f8 > >> 8801002459e8 810436b8 81c6f1e0 > >> 88006d440358 88006d440188 88006e8b4c28 880100245a48 > >> Call Trace: > >> [] dump_stack+0x45/0x56 > >> [] warn_slowpath_common+0x78/0xa0 > >> [] warn_slowpath_fmt+0x47/0x50 > >> [] sysfs_remove_group+0xc6/0xd0 > >> [] dpm_sysfs_remove+0x3e/0x50 > >> [] device_del+0x40/0x1b0 > >> [] device_unregister+0xd/0x20 > >> [] scsi_remove_host+0xba/0x110 > >> [] ata_host_detach+0xc6/0x100 > >> [] ata_pci_remove_one+0x18/0x20 > >> [] pci_device_remove+0x28/0x60 > >> [] __device_release_driver+0x64/0xd0 > >> [] device_release_driver+0x1e/0x30 > >> [] bus_remove_device+0xf7/0x140 > >> [] device_del+0x121/0x1b0 > >> [] pci_stop_bus_device+0x94/0xa0 > >> [] pci_stop_bus_device+0x3b/0xa0 > >> [] pci_stop_bus_device+0x3b/0xa0 > >> [] pci_stop_and_remove_bus_device+0xd/0x20 > >> [] trim_stale_devices+0x73/0xe0 > >> [] trim_stale_devices+0xbb/0xe0 > >> [] trim_stale_devices+0xbb/0xe0 > >> [] acpiphp_check_bridge+0x7e/0xd0 > >> [] hotplug_event+0xcd/0x160 > >> [] hotplug_event_work+0x25/0x60 > >> [] acpi_hotplug_work_fn+0x17/0x22 > >> [] process_one_work+0x17a/0x430 > >> [] worker_thread+0x119/0x390 > >> [] kthread+0xcd/0xf0 > >> [] ret_from_fork+0x7c/0xb0 > > > > So, we do have cases where the parent is removed before the child. I > > suppose the parent pci bridge is removed already? AFAICS this > > shouldn't break anything but people did seem to expect the removals to > > be ordered from child to parent. Bjorn, is this something you expect > > to happened? > > I do not expect a PCI bridge to be removed before the devices below > it. We should be removing all the children before removing the parent > bridge. Precisely. > But is this related to PCI? I don't see the connection yet. I tried > to look into this a bit (my notes are at > https://bugzilla.kernel.org/show_bug.cgi?id=65281), but I haven't > figured out the big-picture problem yet. > > I don't have warm fuzzies that adding a "have we already removed this" > check is the best resolution, but maybe that's just because I don't > understand the problem. I don't think this is related to removing the parent before the child. It complains about the 'power' directory of all devices being removed it seems. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mmzone.h: constify some zone access functions
On Fri, 22 Nov 2013 12:01:06 -0500 Luiz Capitulino wrote: > Signed-off-by: Luiz capitulino > --- > include/linux/mmzone.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index bd791e4..5e202d6 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -560,12 +560,12 @@ static inline bool zone_spans_pfn(const struct zone > *zone, unsigned long pfn) > return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone); > } > > -static inline bool zone_is_initialized(struct zone *zone) > +static inline bool zone_is_initialized(const struct zone *zone) > { > return !!zone->wait_table; > } > > -static inline bool zone_is_empty(struct zone *zone) > +static inline bool zone_is_empty(const struct zone *zone) > { > return zone->spanned_pages == 0; > } > @@ -843,7 +843,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned > long, unsigned long); > */ > #define zone_idx(zone) ((zone) - > (zone)->zone_pgdat->node_zones) > > -static inline int populated_zone(struct zone *zone) > +static inline int populated_zone(const struct zone *zone) > { > return (!!zone->present_pages); > } hm, why? I counted ten similarly constifyable functions in mm.h and stopped only 1/4 of the way through. What's so special about these three? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)
On Fri, 22 Nov 2013, Rafael J. Wysocki wrote: > On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote: > > Ok, I've finally managed to find out the bad commit: > > ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock > > over system PM transitions > > > > I verified that the parent commit doesn't have the problem. > > Interesting. > > > Rafael, you're the man now ;) > > I kind of don't see how that commit may result in behavior that you > described earlier in the thread. > > You get a memory corruption that seems to have started to happen because > we're holding an additional lock over suspend resume now. Something's fishy > on that machine and we need to figure out what it is. The hickup happens in the timer softirq. @Francis: Did you try to enable DEBUG_OBJECTS.*. If not please give it a try. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/