Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 19:19 -0800, Davidlohr Bueso wrote:
> On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote:
> > On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso  wrote:
> > > In futex_wake() there is clearly no point in taking the hb->lock if
> > > we know beforehand that there are no tasks to be woken. This comes
> > > at the smaller cost of doing some atomic operations to keep track of
> > > the list's size.
> > 
> > Hmm. Why? Afaik, you only care about "empty or not". And if you don't
> > need the serialization from locking, then afaik you can just do a
> > "plist_head_empty()" without holding the lock.
> 
> I remember this being the original approach, but after noticing some
> strange behavior we quickly decided it wasn't the path. And sure enough,
> I just double checked and tried the patch without atomic ops and can see
> things being off: one of the futextest performance cases is stuck
> blocked on a futex and I couldn't reboot the machine either -- nothing
> apparent in dmesg, just not 100% functional. The thing is, we can only
> avoid taking the lock only if nobody else is trying to add itself to the
> list.

In your usage, the worst case scenario is that you detect 0 when locking
may have blocked and found a waiter. Correct?

In this case, you return 0, instead of 1 (or more).

This suggests to me a bug in the futextest testcase. Which test
specifically hung up waiting?

Futex hangs are almost always bad userspace code (my bad userspace code
in this case ;-)

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> In futex_wake() there is clearly no point in taking the hb->lock if
> we know beforehand that there are no tasks to be woken. This comes
> at the smaller cost of doing some atomic operations to keep track of
> the list's size. Specifically, increment the counter when an element is
> added to the list, and decrement when it is removed. Of course, if the
> counter is 0, then there are no tasks blocked on a futex. Some special
> considerations:
> 
> - increment the counter at queue_lock() as we always end up calling
>   queue_me() which adds the element to the list. Upon any error,
>   queue_unlock() is called for housekeeping, for which we decrement
>   to mach the increment done in queue_lock().

  ^match

> 
> - decrement the counter at __unqueue_me() to reflect when an element is
>   removed from the queue for wakeup related purposes.

__unqueue_futex (not __unqueue_me)


> @@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
> nr_wake, u32 bitset)
>   goto out;
>  
>   hb = hash_futex();
> + /* make sure we really have tasks to wakeup */

Nit, but please use proper sentence formatting for consistency with the
rest of the comments in futex.c (most of them anyway).

/* Make sure we really have tasks to wake up. */

Now... I'm not thrilled with adding atomics if we don't need to,
especially for an optimization since the atomics themselves cause enough
problems, especially across a large number of CPUs... I'll respond to
Linus's thread though.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] futex: Larger hash table

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> Currently, the futex global hash table suffers from it's fixed, smallish
> (for today's standards) size of 256 entries, as well as its lack of NUMA
> awareness. Large systems, using many futexes, can be prone to high amounts
> of collisions; where these futexes hash to the same bucket and lead to
> extra contention on the same hb->lock. Furthermore, cacheline bouncing is a
> reality when we have multiple hb->locks residing on the same cacheline and
> different futexes hash to adjacent buckets.
> 
> This patch keeps the current static size of 16 entries for small systems,
> or otherwise, 256 * ncpus (or larger as we need to round the number to a
> power of 2). Note that this number of CPUs accounts for all CPUs that can
> ever be available in the system, taking into consideration things like
> hotpluging. While we do impose extra overhead at bootup by making the hash
> table larger, this is a one time thing, and does not shadow the benefits
> of this patch.
> 
> Also, similar to other core kernel components (pid, dcache, tcp), by using
> alloc_large_system_hash() we benefit from its NUMA awareness and thus the
> table is distributed among the nodes instead of in a single one. We impose
> this function's minimum limit of 256 entries, so that in worst case scenarios
> or issues, we still end up using the current amount anyways.
> 
> For a custom microbenchmark that pounds on the uaddr hashing -- making the 
> wait
> path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of 
> futexes,
> we can see the following benefits on a 80-core, 8-socket 1Tb server:
> 
> +-+--+--+--+
> | threads | baseline (ops/sec) [insns/cycle] | large hash table (ops/sec) 
> [insns/cycle] | increase |
> +-+--+--+--+
> | 512 | 34429[0.07]  | 255274[0.48]   
>   | +641.45% |
> | 256 | 65452[0.07]  | 443563[0.41]   
>   | +577.69% |
> | 128 | 125111   [0.07]  | 742613[0.33]   
>   | +493.56% |
> |  80 | 203642   [0.09]  | 1028147   [0.29]   
>   | +404.87% |
> |  64 | 262944   [0.09]  | 997300[0.28]   
>   | +279.28% |
> |  32 | 642390   [0.24]  | 965996[0.27]   
>   | +50.37   |
> +-+--+--+--+
> 
> Cc: Ingo Molnar 
> Cc: Darren Hart 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Mike Galbraith 
> Cc: Jeff Mahoney 
> Cc: Linus Torvalds 
> Cc: Scott Norton 
> Cc: Tom Vaden 
> Cc: Aswin Chandramouleeswaran 
> Signed-off-by: Waiman Long 
> Signed-off-by: Jason Low 
> Signed-off-by: Davidlohr Bueso 
> ---
>  kernel/futex.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 0768c68..5fa9eb0 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -63,6 +63,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -70,7 +71,11 @@
>  
>  int __read_mostly futex_cmpxchg_enabled;
>  
> -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
> +#if CONFIG_BASE_SMALL
> +static unsigned long futex_hashsize = 16;
> +#else
> +static unsigned long futex_hashsize;
> +#endif
>  
>  /*
>   * Futex flags used to encode options to functions and preserve them across
> @@ -151,7 +156,11 @@ struct futex_hash_bucket {
>   struct plist_head chain;
>  };
>  
> -static struct futex_hash_bucket futex_queues[1< +#if CONFIG_BASE_SMALL
> +static struct futex_hash_bucket futex_queues[futex_hashsize];
> +#else
> +static struct futex_hash_bucket *futex_queues;
> +#endif

Something in me squirms at the #if/#else here, but I'll leave that to
the tip maintainers to call out if they are so inclined. Same below.
Otherwise, looks reasonable to me.

Reviewed-by: Darren Hart 

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] futex: Misc cleanups

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> From: Jason Low 
> 
> - Remove unnecessary head variables.
> - Delete unused parameter in queue_unlock().
> 
> Cc: Ingo Molnar 
> Cc: Darren Hart 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Mike Galbraith 
> Cc: Jeff Mahoney 
> Cc: Linus Torvalds 
> Cc: Scott Norton 
> Cc: Tom Vaden 
> Cc: Aswin Chandramouleeswaran 
> Cc: Waiman Long 
> Signed-off-by: Jason Low 
> Signed-off-by: Davidlohr Bueso 

My concern was that we saved off head for a reason, but the hb lock is
held in these cases, and the call is safe against removal, so I don't
see a problem. Thanks.

Reviewed-by: Darren Hart 

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] futex: Wakeup optimizations

2013-11-22 Thread Davidlohr Bueso
Hi Darren,

On Fri, 2013-11-22 at 21:55 -0800, Darren Hart wrote:
> On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> > We have been dealing with a customer database workload on large
> > 12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
> > of contention on some of the locks that serialize internal futex 
> > data structures. This workload specially suffers in the wakeup 
> > paths, where waiting on the corresponding hb->lock can account for 
> > up to ~60% of the time. The result of such calls can mostly be 
> > classified as (i) nothing to wake up and (ii) wakeup large amount 
> > of tasks.
> 
> With as many cores as you have, have you done any analysis of how
> effective the hashing algorithm is, and would more buckets relieve someHi 
> of the contention ah, I see below that you did. Nice work.
> 
> > Before these patches are applied, we can see this pathological behavior:
> > 
> >  37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
> > --- _raw_spin_lock
> >  |
> >  |--97.14%-- futex_wake
> >  |  do_futex
> >  |  sys_futex
> >  |  system_call_fastpath
> >  |  |
> >  |  |--99.70%-- 0x7f383fbdea1f
> >  |  |   yyy
> > 
> >  43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
> > --- _raw_spin_lock
> >  |
> >  |--53.74%-- futex_wake
> >  |  do_futex
> >  |  sys_futex
> >  |  system_call_fastpath
> >  |  |
> >  |  |--99.40%-- 0x7fe7d44a4c05
> >  |  |   zzz
> >  |--45.90%-- futex_wait_setup
> >  |  futex_wait
> >  |  do_futex
> >  |  sys_futex
> >  |  system_call_fastpath
> >  |  0x7fe7ba315789
> >  |  syscall
> > 
> 
> Sorry to be dense, can you spell out how 60% falls out of these numbers?

By adding the respective percentages of futex_wake()*_raw_spin_lock
calls.

> 
> > 
> > With these patches, contention is practically non existent:
> > 
> >  0.10% 49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
> >--- _raw_spin_lock
> > |
> > |--76.06%-- futex_wait_setup
> > |  futex_wait
> > |  do_futex
> > |  sys_futex
> > |  system_call_fastpath
> > |  |
> > |  |--99.90%-- 0x7f3165e63789
> > |  |  syscall|
> >...
> > |--6.27%-- futex_wake
> > |  do_futex
> > |  sys_futex
> > |  system_call_fastpath
> > |  |
> > |  |--54.56%-- 0x7f317fff2c05
> > ...
> > 
> > Patches 1 & 2 are cleanups and micro optimizations.
> > 
> > Patch 3 addresses the well known issue of the global hash table.
> > By creating a larger and NUMA aware table, we can reduce the false
> > sharing and collisions, thus reducing the chance of different futexes 
> > using hb->lock.
> > 
> > Patch 4 reduces contention on the corresponding hb->lock by not trying to
> > acquire it if there are no blocked tasks in the waitqueue.
> > This particularly deals with point (i) above, where we see that it is not
> > uncommon for up to 90% of wakeup calls end up returning 0, indicating that 
> > no
> > tasks were woken.
> 
> Can you determine how much benefit comes from 3 and how much additional
> benefit comes from 4?

While I don't have specific per-patch data, there are indications that
the workload mostly deals with a handful of futexes. So its pretty safe
to assume that patch 4 is the one with the most benefit for _this_
particular workload.

> 
> > 
> > Patch 5 resurrects a two year old idea from Peter Zijlstra to delay
> > the waking of the blocked tasks to be done without holding the hb->lock:
> > https://lkml.org/lkml/2011/9/14/118
> > 
> > This is useful for locking primitives that can effect multiple wakeups
> > per operation and want to avoid the futex's internal spinlock contention by
> > delaying the wakeups until we've released the hb->lock.
> > This particularly deals with point (ii) above, where we can observe that
> > in occasions the wake calls end up waking 125 to 200 waiters in what we 
> > believe 
> > are RW locks in the application.
> > 
> > This patchset has also been tested on smaller systems for a variety of
> > benchmarks, including java workloads, kernel builds and custom 
> > bang-the-hell-out-of
> > hb locks programs. So far, no functional or performance regressions have 
> > been seen.
> > Furthermore, no issues were found when running the 

Re: [PATCH 0/5] futex: Wakeup optimizations

2013-11-22 Thread Mike Galbraith
On Fri, 2013-11-22 at 21:55 -0800, Darren Hart wrote: 
> On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:

> > This patchset has also been tested on smaller systems for a variety of
> > benchmarks, including java workloads, kernel builds and custom 
> > bang-the-hell-out-of
> > hb locks programs. So far, no functional or performance regressions have 
> > been seen.
> > Furthermore, no issues were found when running the different tests in the 
> > futextest 
> > suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/
> 
> Excellent. Would you be able to contribute any of these (C only please)
> to the stress test group?

FWIW, I plugged this series into an rt kernel (extra raciness) and beat
it up a bit on a 64 core box too.  Nothing fell out, nor did futextest
numbers change outside variance (poor box has 8 whole gig ram, single
numa node, so kinda crippled/wimpy, and not good box for benchmarking).

What concerned me most about the series was 5/5.. looks like a great
idea to me, but the original thread did not have a happy ending.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] futex: Check for pi futex_q only once

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> All wake_futex() callers already verify that the we are not dealing with
> a pi futex_q, so we can remove the redundant WARN() check, as this is never
> triggered anyway.
> 
> Cc: Ingo Molnar 
> Cc: Darren Hart 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Mike Galbraith 
> Cc: Jeff Mahoney 
> Cc: Linus Torvalds 
> Cc: Scott Norton 
> Cc: Tom Vaden 
> Cc: Aswin Chandramouleeswaran 
> Cc: Waiman Long 
> Cc: Jason Low 
> Signed-off-by: Davidlohr Bueso 
> ---
>  kernel/futex.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index e6ffe73..0768c68 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -844,9 +844,6 @@ static void wake_futex(struct futex_q *q)
>  {
>   struct task_struct *p = q->task;
>  
> - if (WARN(q->pi_state || q->rt_waiter, "refusing to wake PI futex\n"))
> - return;
> -

This was added deliberately after adding said checks to the callers...
admittedly after a very long debug session I didn't ever want to repeat.
Sometimes warnings are added to make sure we caught everything and later
removed sometimes they are added to make sure nothing new ever
breaks this again. Since the failure scenario is non-obvious, unless
this is causing some significant performance issues for you, I'd prefer
this stays.

See commit aa10990e028cac3d5e255711fb9fb47e00700e35 for details.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Security subsystem updates for 3.13

2013-11-22 Thread George Spelvin
On Thu, 21 Nov 2013, Linus Torvalds wrote:
> I do see *some* minimal comments on it from George Spelvin on lkml.

I'd like to apologize for dropping the ball on that.  I started working
on it seriously, but with various emergencies, I've been AFK from lkml
for the last month.

I'm not really thilled with it; I think the fanout of 16 is low for
something with its scale ambitions, and the properties expected of the
chunked key access method are not documented as clearly as they should be.

The way the key is fiddled the put keyring objects in a contiguous
range of the trie is a particularly egregious layering violation.

But I am convinced that it's been tested and works; my complaints are
in the areas of ugliness and efficiency.  And it's layered well enough that
it can be fixed later without radical sirgery.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usb: chipidea: add support for USB OTG controller on TI-NSPIRE

2013-11-22 Thread dt . tangr
From: Daniel Tang 

The USB controller in TI-NSPIRE calculators are based
off either Freescale's USB OTG controller or the USB
controller found in the IMX233, both of which are
Chipidea compatible.

This patch adds a device tree binding for the controller.

Signed-off-by: Daniel Tang 
---
 .../devicetree/bindings/usb/ci13xxx-nspire.txt | 17 +
 drivers/usb/chipidea/Makefile  |  1 +
 drivers/usb/chipidea/ci_hdrc_nspire.c  | 74 ++
 3 files changed, 92 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt
 create mode 100644 drivers/usb/chipidea/ci_hdrc_nspire.c

diff --git a/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt 
b/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt
new file mode 100644
index 000..5ba8e90
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/ci13xxx-nspire.txt
@@ -0,0 +1,17 @@
+* TI-Nspire USB OTG Controller
+
+Required properties:
+- compatible: Should be "zevio,nspire-usb"
+- reg: Should contain registers location and length
+- interrupts: Should contain controller interrupt
+
+Recommended properies:
+- vbus-supply: regulator for vbus
+
+Examples:
+   usb0: usb@B000 {
+   reg = <0xB000 0x1000>;
+   compatible = "zevio,nspire-usb";
+   interrupts = <8>;
+   vbus-supply = <_reg>;
+   };
diff --git a/drivers/usb/chipidea/Makefile b/drivers/usb/chipidea/Makefile
index a99d980..245ea4d 100644
--- a/drivers/usb/chipidea/Makefile
+++ b/drivers/usb/chipidea/Makefile
@@ -10,6 +10,7 @@ ci_hdrc-$(CONFIG_USB_CHIPIDEA_DEBUG)  += debug.o
 # Glue/Bridge layers go here

 obj-$(CONFIG_USB_CHIPIDEA) += ci_hdrc_msm.o
+obj-$(CONFIG_USB_CHIPIDEA) += ci_hdrc_nspire.o

 # PCI doesn't provide stubs, need to check
 ifneq ($(CONFIG_PCI),)
diff --git a/drivers/usb/chipidea/ci_hdrc_nspire.c 
b/drivers/usb/chipidea/ci_hdrc_nspire.c
new file mode 100644
index 000..318761d
--- /dev/null
+++ b/drivers/usb/chipidea/ci_hdrc_nspire.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2013 Daniel Tang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Based off drivers/usb/chipidea/ci_hdrc_msm.c
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "ci.h"
+
+static struct ci_hdrc_platform_data ci_hdrc_nspire_platdata = {
+   .name   = "ci_hdrc_nspire",
+   .flags  = CI_HDRC_REGS_SHARED,
+   .capoffset  = DEF_CAPOFFSET,
+};
+
+static int ci_hdrc_nspire_probe(struct platform_device *pdev)
+{
+   struct platform_device *ci_pdev;
+
+   dev_dbg(>dev, "ci_hdrc_nspire_probe\n");
+
+   ci_pdev = ci_hdrc_add_device(>dev,
+   pdev->resource, pdev->num_resources,
+   _hdrc_nspire_platdata);
+
+   if (IS_ERR(ci_pdev)) {
+   dev_err(>dev, "ci_hdrc_add_device failed!\n");
+   return PTR_ERR(ci_pdev);
+   }
+
+   platform_set_drvdata(pdev, ci_pdev);
+
+   return 0;
+}
+
+static int ci_hdrc_nspire_remove(struct platform_device *pdev)
+{
+   struct platform_device *ci_pdev = platform_get_drvdata(pdev);
+
+   ci_hdrc_remove_device(ci_pdev);
+
+   return 0;
+}
+
+static const struct of_device_id ci_hdrc_nspire_dt_ids[] = {
+   { .compatible = "zevio,nspire-usb", },
+   { /* sentinel */ }
+};
+
+static struct platform_driver ci_hdrc_nspire_driver = {
+   .probe = ci_hdrc_nspire_probe,
+   .remove = ci_hdrc_nspire_remove,
+   .driver = {
+   .name = "nspire_usb",
+   .owner = THIS_MODULE,
+   .of_match_table = ci_hdrc_nspire_dt_ids,
+   },
+};
+
+MODULE_DEVICE_TABLE(of, ci_hdrc_nspire_dt_ids);
+module_platform_driver(ci_hdrc_nspire_driver);
+
+MODULE_ALIAS("platform:nspire_usb");
+MODULE_ALIAS("platform:ci13xxx_nspire");
+MODULE_LICENSE("GPL v2");
--
1.8.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] gpiolib: fix lookup of platform-mapped GPIOs

2013-11-22 Thread Alexandre Courbot
A typo resulted in GPIO lookup failing unconditionally.

Signed-off-by: Alexandre Courbot 
Reported-by: Stephen Warren 
---
 drivers/gpio/gpiolib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 980b08d..86caf17 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2367,7 +2367,7 @@ static struct gpio_desc *gpiod_find(struct device *dev, 
const char *con_id,
continue;
}
 
-   if (chip->ngpio >= p->chip_hwnum) {
+   if (chip->ngpio <= p->chip_hwnum) {
dev_warn(dev, "GPIO chip %s has %d GPIOs\n",
 chip->label, chip->ngpio);
continue;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] futex: Wakeup optimizations

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> We have been dealing with a customer database workload on large
> 12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
> of contention on some of the locks that serialize internal futex 
> data structures. This workload specially suffers in the wakeup 
> paths, where waiting on the corresponding hb->lock can account for 
> up to ~60% of the time. The result of such calls can mostly be 
> classified as (i) nothing to wake up and (ii) wakeup large amount 
> of tasks.

With as many cores as you have, have you done any analysis of how
effective the hashing algorithm is, and would more buckets relieve some
of the contention ah, I see below that you did. Nice work.

> Before these patches are applied, we can see this pathological behavior:
> 
>  37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
> --- _raw_spin_lock
>  |
>  |--97.14%-- futex_wake
>  |  do_futex
>  |  sys_futex
>  |  system_call_fastpath
>  |  |
>  |  |--99.70%-- 0x7f383fbdea1f
>  |  |   yyy
> 
>  43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
> --- _raw_spin_lock
>  |
>  |--53.74%-- futex_wake
>  |  do_futex
>  |  sys_futex
>  |  system_call_fastpath
>  |  |
>  |  |--99.40%-- 0x7fe7d44a4c05
>  |  |   zzz
>  |--45.90%-- futex_wait_setup
>  |  futex_wait
>  |  do_futex
>  |  sys_futex
>  |  system_call_fastpath
>  |  0x7fe7ba315789
>  |  syscall
> 

Sorry to be dense, can you spell out how 60% falls out of these numbers?

> 
> With these patches, contention is practically non existent:
> 
>  0.10% 49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
>--- _raw_spin_lock
> |
> |--76.06%-- futex_wait_setup
> |  futex_wait
> |  do_futex
> |  sys_futex
> |  system_call_fastpath
> |  |
> |  |--99.90%-- 0x7f3165e63789
> |  |  syscall|
>...
> |--6.27%-- futex_wake
> |  do_futex
> |  sys_futex
> |  system_call_fastpath
> |  |
> |  |--54.56%-- 0x7f317fff2c05
> ...
> 
> Patches 1 & 2 are cleanups and micro optimizations.
> 
> Patch 3 addresses the well known issue of the global hash table.
> By creating a larger and NUMA aware table, we can reduce the false
> sharing and collisions, thus reducing the chance of different futexes 
> using hb->lock.
> 
> Patch 4 reduces contention on the corresponding hb->lock by not trying to
> acquire it if there are no blocked tasks in the waitqueue.
> This particularly deals with point (i) above, where we see that it is not
> uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
> tasks were woken.

Can you determine how much benefit comes from 3 and how much additional
benefit comes from 4?

> 
> Patch 5 resurrects a two year old idea from Peter Zijlstra to delay
> the waking of the blocked tasks to be done without holding the hb->lock:
> https://lkml.org/lkml/2011/9/14/118
> 
> This is useful for locking primitives that can effect multiple wakeups
> per operation and want to avoid the futex's internal spinlock contention by
> delaying the wakeups until we've released the hb->lock.
> This particularly deals with point (ii) above, where we can observe that
> in occasions the wake calls end up waking 125 to 200 waiters in what we 
> believe 
> are RW locks in the application.
> 
> This patchset has also been tested on smaller systems for a variety of
> benchmarks, including java workloads, kernel builds and custom 
> bang-the-hell-out-of
> hb locks programs. So far, no functional or performance regressions have been 
> seen.
> Furthermore, no issues were found when running the different tests in the 
> futextest 
> suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/

Excellent. Would you be able to contribute any of these (C only please)
to the stress test group?

> 
> This patchset applies on top of Linus' tree as of v3.13-rc1.
> 
> Special thanks to Scott Norton, Tom Vanden and Mark Ray for help presenting, 
> debugging and analyzing the data.
> 
>   futex: Misc cleanups
>   futex: Check for pi futex_q only once
>   futex: Larger hash table
>   futex: Avoid taking hb lock if nothing to wakeup
>   sched,futex: Provide delayed wakeup list
> 

[PATCH] gpiolib: add missing declarations

2013-11-22 Thread Alexandre Courbot
Add missing declarations and include files to avoid warnings during
compilation of include/gpio/driver.h.

Signed-off-by: Alexandre Courbot 
Reported-by: Stephen Warren 
---
 include/linux/gpio/driver.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 82eac61..3ea2cf6 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -2,9 +2,12 @@
 #define __LINUX_GPIO_DRIVER_H
 
 #include 
+#include 
 
 struct device;
 struct gpio_desc;
+struct of_phandle_args;
+struct device_node;
 struct seq_file;
 
 /**
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups

2013-11-22 Thread Guenter Roeck
Simplify the code and fix race condition seen because
attribute files were created after hwmon device registration.

Signed-off-by: Guenter Roeck 
---
Compile tested only; unfortunately I don't have the the necessary hardware.

 drivers/gpu/drm/radeon/radeon_pm.c |   49 +---
 1 file changed, 12 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index d1385cc..dc75bb6 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -537,8 +537,7 @@ static ssize_t radeon_hwmon_show_temp(struct device *dev,
  struct device_attribute *attr,
  char *buf)
 {
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct radeon_device *rdev = ddev->dev_private;
+   struct radeon_device *rdev = dev_get_drvdata(dev);
int temp;
 
if (rdev->asic->pm.get_temperature)
@@ -566,23 +565,14 @@ static ssize_t radeon_hwmon_show_temp_thresh(struct 
device *dev,
return snprintf(buf, PAGE_SIZE, "%d\n", temp);
 }
 
-static ssize_t radeon_hwmon_show_name(struct device *dev,
- struct device_attribute *attr,
- char *buf)
-{
-   return sprintf(buf, "radeon\n");
-}
-
 static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, radeon_hwmon_show_temp, NULL, 
0);
 static SENSOR_DEVICE_ATTR(temp1_crit, S_IRUGO, radeon_hwmon_show_temp_thresh, 
NULL, 0);
 static SENSOR_DEVICE_ATTR(temp1_crit_hyst, S_IRUGO, 
radeon_hwmon_show_temp_thresh, NULL, 1);
-static SENSOR_DEVICE_ATTR(name, S_IRUGO, radeon_hwmon_show_name, NULL, 0);
 
 static struct attribute *hwmon_attributes[] = {
_dev_attr_temp1_input.dev_attr.attr,
_dev_attr_temp1_crit.dev_attr.attr,
_dev_attr_temp1_crit_hyst.dev_attr.attr,
-   _dev_attr_name.dev_attr.attr,
NULL
 };
 
@@ -607,11 +597,15 @@ static const struct attribute_group hwmon_attrgroup = {
.is_visible = hwmon_attributes_visible,
 };
 
+static const struct attribute_group *hwmon_groups[] = {
+   _attrgroup,
+   NULL
+};
+
 static int radeon_hwmon_init(struct radeon_device *rdev)
 {
int err = 0;
-
-   rdev->pm.int_hwmon_dev = NULL;
+   struct device *hwmon_dev;
 
switch (rdev->pm.int_thermal_type) {
case THERMAL_TYPE_RV6XX:
@@ -624,20 +618,13 @@ static int radeon_hwmon_init(struct radeon_device *rdev)
case THERMAL_TYPE_KV:
if (rdev->asic->pm.get_temperature == NULL)
return err;
-   rdev->pm.int_hwmon_dev = hwmon_device_register(rdev->dev);
-   if (IS_ERR(rdev->pm.int_hwmon_dev)) {
-   err = PTR_ERR(rdev->pm.int_hwmon_dev);
+   hwmon_dev = hwmon_device_register_with_groups(rdev->dev,
+ "radeon", rdev,
+ hwmon_groups);
+   if (IS_ERR(hwmon_dev)) {
+   err = PTR_ERR(hwmon_dev);
dev_err(rdev->dev,
"Unable to register hwmon device: %d\n", err);
-   break;
-   }
-   dev_set_drvdata(rdev->pm.int_hwmon_dev, rdev->ddev);
-   err = sysfs_create_group(>pm.int_hwmon_dev->kobj,
-_attrgroup);
-   if (err) {
-   dev_err(rdev->dev,
-   "Unable to create hwmon sysfs file: %d\n", err);
-   hwmon_device_unregister(rdev->dev);
}
break;
default:
@@ -647,14 +634,6 @@ static int radeon_hwmon_init(struct radeon_device *rdev)
return err;
 }
 
-static void radeon_hwmon_fini(struct radeon_device *rdev)
-{
-   if (rdev->pm.int_hwmon_dev) {
-   sysfs_remove_group(>pm.int_hwmon_dev->kobj, 
_attrgroup);
-   hwmon_device_unregister(rdev->pm.int_hwmon_dev);
-   }
-}
-
 static void radeon_dpm_thermal_work_handler(struct work_struct *work)
 {
struct radeon_device *rdev =
@@ -1337,8 +1316,6 @@ static void radeon_pm_fini_old(struct radeon_device *rdev)
 
if (rdev->pm.power_state)
kfree(rdev->pm.power_state);
-
-   radeon_hwmon_fini(rdev);
 }
 
 static void radeon_pm_fini_dpm(struct radeon_device *rdev)
@@ -1358,8 +1335,6 @@ static void radeon_pm_fini_dpm(struct radeon_device *rdev)
 
if (rdev->pm.power_state)
kfree(rdev->pm.power_state);
-
-   radeon_hwmon_fini(rdev);
 }
 
 void radeon_pm_fini(struct radeon_device *rdev)
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH 2/2] drm/nouveau/hwmon: Convert to use devm_hwmon_device_register_with_groups

2013-11-22 Thread Guenter Roeck
Simplify the code and resolve race conditions seen because
attribute files are created after hwmon device registration,
and removed before hwmon device removal.

Signed-off-by: Guenter Roeck 
---
Compile tested only; unfortunately I don't have the the necessary hardware.

 drivers/gpu/drm/nouveau/nouveau_hwmon.c |  243 +++
 drivers/gpu/drm/nouveau/nouveau_hwmon.h |   11 --
 2 files changed, 84 insertions(+), 170 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_hwmon.c 
b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
index 38a4db5..d4144b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_hwmon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
@@ -42,9 +42,7 @@
 static ssize_t
 nouveau_hwmon_show_temp(struct device *d, struct device_attribute *a, char 
*buf)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
int temp = therm->temp_get(therm);
 
if (temp < 0)
@@ -68,9 +66,7 @@ static ssize_t
 nouveau_hwmon_temp1_auto_point1_temp(struct device *d,
 struct device_attribute *a, char *buf)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
 
return snprintf(buf, PAGE_SIZE, "%d\n",
  therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_FAN_BOOST) * 1000);
@@ -80,9 +76,7 @@ nouveau_hwmon_set_temp1_auto_point1_temp(struct device *d,
 struct device_attribute *a,
 const char *buf, size_t count)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
long value;
 
if (kstrtol(buf, 10, ) == -EINVAL)
@@ -101,9 +95,7 @@ static ssize_t
 nouveau_hwmon_temp1_auto_point1_temp_hyst(struct device *d,
  struct device_attribute *a, char *buf)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
 
return snprintf(buf, PAGE_SIZE, "%d\n",
 therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_FAN_BOOST_HYST) * 1000);
@@ -113,9 +105,7 @@ nouveau_hwmon_set_temp1_auto_point1_temp_hyst(struct device 
*d,
  struct device_attribute *a,
  const char *buf, size_t count)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
long value;
 
if (kstrtol(buf, 10, ) == -EINVAL)
@@ -133,9 +123,7 @@ static SENSOR_DEVICE_ATTR(temp1_auto_point1_temp_hyst, 
S_IRUGO | S_IWUSR,
 static ssize_t
 nouveau_hwmon_max_temp(struct device *d, struct device_attribute *a, char *buf)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
 
return snprintf(buf, PAGE_SIZE, "%d\n",
   therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_DOWN_CLK) * 1000);
@@ -144,9 +132,7 @@ static ssize_t
 nouveau_hwmon_set_max_temp(struct device *d, struct device_attribute *a,
const char *buf, size_t count)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
long value;
 
if (kstrtol(buf, 10, ) == -EINVAL)
@@ -164,9 +150,7 @@ static ssize_t
 nouveau_hwmon_max_temp_hyst(struct device *d, struct device_attribute *a,
char *buf)
 {
-   struct drm_device *dev = dev_get_drvdata(d);
-   struct nouveau_drm *drm = nouveau_drm(dev);
-   struct nouveau_therm *therm = nouveau_therm(drm->device);
+   struct nouveau_therm *therm = dev_get_drvdata(d);
 
return snprintf(buf, PAGE_SIZE, "%d\n",
  therm->attr_get(therm, NOUVEAU_THERM_ATTR_THRS_DOWN_CLK_HYST) * 1000);
@@ -175,9 +159,7 @@ static ssize_t
 nouveau_hwmon_set_max_temp_hyst(struct device *d, struct device_attribute *a,
const char *buf, size_t count)
 {
-   struct drm_device *dev = 

Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Hart, Darren
On Fri, 2013-11-22 at 21:40 -0800, Darren Hart wrote:
> On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> > In futex_wake() there is clearly no point in taking the hb->lock if
> > we know beforehand that there are no tasks to be woken. This comes
> > at the smaller cost of doing some atomic operations to keep track of
> > the list's size. Specifically, increment the counter when an element is
> > added to the list, and decrement when it is removed. Of course, if the
> > counter is 0, then there are no tasks blocked on a futex. Some special
> > considerations:
> > 
> > - increment the counter at queue_lock() as we always end up calling
> >   queue_me() which adds the element to the list. Upon any error,
> >   queue_unlock() is called for housekeeping, for which we decrement
> >   to mach the increment done in queue_lock().
> > 
> > - decrement the counter at __unqueue_me() to reflect when an element is
> >   removed from the queue for wakeup related purposes.
> 
> What is the problem you are trying to solve here?

Apologies, too quick on the trigger. I see plenty of detail in 0/5. Will
spend some time reviewing that.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel



Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Darren Hart
On Fri, 2013-11-22 at 16:56 -0800, Davidlohr Bueso wrote:
> In futex_wake() there is clearly no point in taking the hb->lock if
> we know beforehand that there are no tasks to be woken. This comes
> at the smaller cost of doing some atomic operations to keep track of
> the list's size. Specifically, increment the counter when an element is
> added to the list, and decrement when it is removed. Of course, if the
> counter is 0, then there are no tasks blocked on a futex. Some special
> considerations:
> 
> - increment the counter at queue_lock() as we always end up calling
>   queue_me() which adds the element to the list. Upon any error,
>   queue_unlock() is called for housekeeping, for which we decrement
>   to mach the increment done in queue_lock().
> 
> - decrement the counter at __unqueue_me() to reflect when an element is
>   removed from the queue for wakeup related purposes.

What is the problem you are trying to solve here?

The fast-path in futexes is not calling into the kernel at all, if
you've made the syscall, you've already lost. What kind of improvement
are you seeing by adding this optimization? Is it worth the introduction
of atomic operations into a file known to the state of California to
increase the risk of liver failure?

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/7] ARM: tegra: add gpiod_lookup table for paz00

2013-11-22 Thread Alexandre Courbot
On Sat, Nov 23, 2013 at 3:40 AM, Stephen Warren  wrote:
> On 11/22/2013 05:14 AM, Mika Westerberg wrote:
>> From: Heikki Krogerus 
>>
>> This makes it possible to request the gpio descriptors in
>> rfkill-gpio driver regardless of the platform.
>
> Patches 1-3,
> Tested-by: Stephen Warren 
>
> All the testing I did was to "cat /sys/kernel/debug/gpios" to make sure
> those two GPIOs had been correctly acquired by the driver, and that they
> changed state as expected via the command-line "rfkill {un,}block" commands.
>
> However, please note that I had to apply a couple fixes to the gpiolib
> core to get this working:
>
> 1)
>
> To solve the following build warning:
>
>> In file included from arch/arm/mach-tegra/board-paz00.c:21:0:
>> include/linux/gpio/driver.h:102:17: warning: ‘struct of_phandle_args’ 
>> declared inside parameter list [enabled by default]
>> include/linux/gpio/driver.h:102:17: warning: its scope is only this 
>> definition or declaration, which is probably not what you want [enabled by 
>> default]
>
> I applied:
>
>
>> diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
>> index 82eac610ce1a..5133cf29803a 100644
>> --- a/include/linux/gpio/driver.h
>> +++ b/include/linux/gpio/driver.h
>> @@ -5,6 +5,7 @@
>>
>>  struct device;
>>  struct gpio_desc;
>> +struct of_phandle_args;
>>  struct seq_file;
>>
>>  /**
>
> 2)
>
> In order to get the GPIO lookups from the rfkill driver working, I applied:
>
>> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
>> index ac53a9593662..b73c39f99858 100644
>> --- a/drivers/gpio/gpiolib.c
>> +++ b/drivers/gpio/gpiolib.c
>> @@ -2368,7 +2368,7 @@ static struct gpio_desc *gpiod_find(struct device 
>> *dev, const char *con_id,
>>   continue;
>>   }
>>
>> - if (chip->ngpio >= p->chip_hwnum) {
>> + if (chip->ngpio <= p->chip_hwnum) {
>>   dev_warn(dev, "GPIO chip %s has %d GPIOs\n",
>>chip->label, chip->ngpio);
>>   continue;

Mmm. I guess should be blamed for this one. I haven't tested GPIO
platform lookup in a while and this seems to have slipped in at some
point. :( I will send fixup patches to Linus for both issues, thanks
for pointing them out.

Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:sched/urgent] sched: Check sched_domain before computing group power

2013-11-22 Thread Yinghai Lu
On Fri, Nov 22, 2013 at 4:07 AM, Peter Zijlstra  wrote:
> On Thu, Nov 21, 2013 at 09:22:24AM -0800, Yinghai Lu wrote:
>> On Thu, Nov 21, 2013 at 7:03 AM, Peter Zijlstra  wrote:
>> >>
>> >> This one seems fix NULL reference in compute_group_power.
>> >>
>> >> but get following on current Linus tree plus tip/sched/urgent.
>> >>
>> >> divide error:  [#1]  SMP
>> >> [   28.190477] Modules linked in:
>> >> [   28.192012] CPU: 11 PID: 484 Comm: kworker/u324:0 Not tainted
>> >> 3.12.0-yh-10487-g4b94e59-dirty #2044
>> >> [   28.210488] Hardware name: Oracle Corporation  Sun Fire
>> >> [   28.229877] task: 88ff25205140 ti: 88ff2520a000 task.ti:
>> >> 88ff2520a000
>> >> [   28.236139] RIP: 0010:[]  []
>> >> find_busiest_group+0x2b4/0x8a0
>> >
>> > Hurmph.. what kind of hardware is that? and is there anything funny you
>> > do to make it do this?
>>
>> intel nehanem-ex or westmere-ex 8 sockets system.
>>
>> I tried without my local patches, the problem is still there.
>
> And I suppose a kernel before
>
>   863bffc80898 ("sched/fair: Fix group power_orig computation")
>
> work fine, eh?
>
> I'll further assume that your RIP points to:
>
> sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
>
> indicating that sds.total_pwr := 0.
>
> update_sd_lb_stats() computes it like:
>
> sds->total_pwr += sgs->group_power;
>
> which comes out of update_sg_lb_stats() like:
>
> sgs->group_power = group->sgp->power;
>
> Which we compute in update_group_power() similarly to how we did before
> 863bffc80898.
>
> Which leaves me a bit puzzled.

Hi,
for linus tree i need to revert commit-863bffc.
   commit-863bffc

for linus tree + sched/urgent, I need to revert
   commit-42eb088
   commit-9abf24d
   commit-863bffc
.
If only revert commit-42eb088, still have problem.
if only revert  commit-9abf24d, commit-863bffc, still have problem.

Assume you need to dump sched/urgent,
and revert commit-863bffc directly from Linus's tree.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Waiman Long

On 11/22/2013 08:25 PM, Linus Torvalds wrote:

On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso  wrote:

In futex_wake() there is clearly no point in taking the hb->lock if
we know beforehand that there are no tasks to be woken. This comes
at the smaller cost of doing some atomic operations to keep track of
the list's size.

Hmm. Why? Afaik, you only care about "empty or not". And if you don't
need the serialization from locking, then afaik you can just do a
"plist_head_empty()" without holding the lock.

NOTE!

The "list_empty()" function is very much designed to work even without
holding a lock (as long as the head itself exists reliably, of course)
BUT you have to then guarantee yourself that your algorithm doesn't
have any races wrt other CPU's adding an entry to the list at the same
time. Not holding a lock obviously means that you are not serialized
against that.. We've had problems with people doing

 if (!list_empty(waiters))
 wake_up_list(..)

because they wouldn't wake people up who just got added.

But considering that your atomic counter checking has the same lack of
serialization, at least the plist_head_empty() check shouldn't be any
worse than that counter thing.. And doesn't need any steenking atomic
ops or a new counter field.

 Linus


Before the patch, a waker will wake up one or waiters who call 
spin_lock() before the waker. If we just use plist_head_empty(), we will 
miss waiters who have called spin_lock(), but have not yet got the lock 
or inserted itself into the wait list. By adding atomic counter for 
checking purpose, we again has a single point where any waiters who pass 
that point (inc atomic counter) can be woken up by a waiter who check 
the atomic counter after they inc it. This acts sort of like 
serialization without using a lock.


-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/locking 0/3] Memory-barrier documentation updates

2013-11-22 Thread Josh Triplett
On Fri, Nov 22, 2013 at 11:26:57AM -0800, Paul E. McKenney wrote:
> Hello!
> 
> This series applies some long-needed updates to memory-barriers.txt:
> 
> 1.Add ACCESS_ONCE() calls where needed to ensure their inclusion
>   in code copy-and-pasted from this file.
> 
> 2.Add long atomic examples alongside the existing atomics.
> 
> 3.Prohibit architectures supporting the Linux kernel from
>   speculating stores.
> 
> Changes from v2:
> 
> o Update examples so that that load against which the subsequent
>   store is to be ordered is part of the "if" condition.
> 
> o Add an example showing how the compiler can remove "if"
>   conditions and how to prevent it from doing so.
> 
> o Add ACCESS_ONCE() to the compiler-barrier section.
> 
> o Add a sentence noting that transitivity requires smp_mb().
> 
> Changes from v1:
> 
> o Combined with Peter Zijlstra's speculative-store-prohibition patch.
> 
> o Added more pitfalls to avoid when prohibiting speculative
>   stores, along with how to avoid them.
> 
> o Applied Josh Triplett's review comments.

For v3, all three patches:
Reviewed-by: Josh Triplett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] kexec: A new system call, kexec_file_load, for in kernel kexec

2013-11-22 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Thu, Nov 21, 2013 at 07:19:07PM +, Matthew Garrett wrote:
>> On Thu, Nov 21, 2013 at 02:13:05PM -0500, Vivek Goyal wrote:
>> > On Thu, Nov 21, 2013 at 07:06:20PM +, Matthew Garrett wrote:
>> > > That would require a certain degree of massaging from userspace if we 
>> > > want to be able to use the existing Authenticode signatures. Otherwise 
>> > > we need to sign kernels twice.
>> > 
>> > I was thinking oof signing the same kernel twice. Can I sign authenticode
>> > signed kernel again (using RSA signature as we do for modules) and append
>> > the signature to bzImage. 
>> 
>> No, you'd need to do it the other way around.
>
> Hmm..., I am running out of ideas here. This is what I understand.
>
> - If I sign the bzImage (using PKCS1.5 signature), and later it is signed
>   with authenticode format signatures, then PKCS1.5 signatures will not be
>   valid as PE/COFF signing will do some modification to PE/COFF header in
>   bzImage. And another problem is that then I don't have a way to find
>   PKCS1.5 signature.
>
> - If bzImage is first signed with authenticode format signature and then
>   signed using PKCS1.5 signature, then  authenticode format signature
>   will become invalid as it will also hash the data appened at the end
>   of file.
>
> So looks like both signatures can't co-exist on same file. That means
> one signature has to be detached.
>
> I am beginning to think that create a kernel option which allows to choose
> between attached and detached signatures. Extend kexec syscall to allow
> a parameter to pass in detached signatures. If detached signatures are
> not passed, then look for signatures at the end of file. That way, those
> who are signing kernels using platform specific format (authenticode) in 
> this case, they can generate detached signature while others can just
> use attached signatures.
>
> Any thoughts on how this should be handled?

Inside of a modern bzImage there is an embedded ELF image.  How about in
userspace we just strip out the embedded ELF image and write that to a
file.  Then we can use the same signature checking scheme as we do for
kernel modules.  And you only have to support one file format.

As I recall there are already some platforms on x86 like Xen that
already need to strip out the embedded ELF image for their loaders to
have all of the information they need to load the image.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] kexec: A new system call to allow in kernel loading

2013-11-22 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Fri, Nov 22, 2013 at 05:34:03AM -0800, Eric W. Biederman wrote:
>
> [..]
>> > Why ELF case is so interesting. I have not use kexec to boot ELF
>> > images in years and have not seen others using it too. In fact bzImage
>> > seems to be the most common kernel image format for x86, most of the 
>> > distros
>> > ship and use.
>> 
>> ELF is interesting because it is the minimal file format that does
>> everything you need.   So especially for a proof of concept ELF needs to
>> come first.  There is an extra virtual address field in the ELF segment
>> header but otherwise ELF does not have any unnecessary fields.
>> 
>> ELF is interesting because it is the native kernel file format on all
>> architectures linux supports including x86.
>> 
>> ELF is interesting because producing an ELF image in practice requires
>> a trivial amount of tooling so it is a good general purpose format to
>> support.
>
> Ok. I will have a look at ELF loader too. I was hoping to keep only one
> loader in initial patch. But looks like that's not acceptable.

Thank you.  We really need an ELF loader to see that we have handled the
general case and to see that we have a solution that is portable to
non-x86.

> [..]
>> >> There is also a huge missing piece of this in that your purgatory is not
>> >> checking a hash of the loaded image before jumping too it.  Without that
>> >> this is a huge regression at least for the kexec on panic case.  We
>> >> absolutely need to check that the kernel sitting around in memory has
>> >> not been corrupted before we let it run very far.
>> >
>> > Agreed. This should not be hard. It is just a matter of calcualting
>> > digest of segments. I will store it in kimge and verify digest again
>> > before passing control to control page. Will fix it in next version.
>> 
>> Nak.  The verification needs to happen in purgatory. 
>> 
>> The verification needs to happen in code whose runtime environment is
>> does not depend on random parts of the kernel.  Anything else is a
>> regression in maintainability and reliability.
>> 
>> It is the wrong direction to add any code to what needs to run in the
>> known broken environment of the kernel when a panic happens.
>> 
>> Which means that you almost certainly need to go to the trouble of
>> supporting the complexity needed to support purgatory code written in C.
>> 
>> (For those just tuning in purgatory is our term for the code that runs
>> between the kernels to do those things that can not happen a priori).
>
> In general, I agree with not using kernel parts after crash.
>
> But what protects against that purgatory itself has been scribbled over.
> IOW, how different purgatory memory is as compared to kernel memory where
> digest routines are stored. They have got equal probably of being scribbled
> over and if that's the case one is not better than other?
>
> And if they both got equal probability to getting corrupted, then there does
> not seem to be an advantage in moving digest verification inside
> purgatory.

The primary reason is that maintenance of code in the kernel that is
safe during a crash dump is hard.  That is why we boot a second kernel
after all.  If the code to do the signature verification resides in
machine_kexec on the kexec on panic code path in the kernel that has
called panic it is almost a given that at some point or other someone
will add an option that will add a weird dependency that makes the code
unsafe when the kernel is crashing.  I have seen it happen several times
on the existing kexec on panic code path.  I have seen it on other code
paths like netconsole.  Which can currently on some kernels I have
running cause the kernel go go into an endless printk loop if you call
printk from interrupt context.  So what we really gain by moving the
verification into purgatory is protection from inappropriate code reuse.

So having a completely separate piece of code may be a little harder to
write initially but the code is much simpler and more reliable to
maintain.  Essentially requiring no maintenance effort.  Further getting
to the point where purgatory is written in C makes small changes much
more approachable.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Doubt on Implict BUGs caused LRO] doubt about lro status in v3.12 of torvalds/linux.git

2013-11-22 Thread Yanfei Wang
Hi Eric,

Thanks for quick reply.  :-)

Also, I am confused that in v3.12 why inet_lro.c file is not found in
/net/ipv4/ directory, while Makefile has "obj-$(CONFIG_INET_LRO) +=
inet_lro.o" and LRO heads file exsits.

If I am wrong, pls correct me, :-)

 See the steps 3 for details,(I have repeated it for several times)

3. cd net/ipv4,
 [backyes@f14 ipv4]$ cat Makefile | grep lro
obj-$(CONFIG_INET_LRO) += inet_lro.o
[backyes@f14 ipv4]$ ls inet_*
inet_connection_sock.c  inet_diag.c  inet_fragment.c
inet_hashtables.c  inet_timewait_sock.c
---

Best regards

Yanfei Wang




On Sat, Nov 23, 2013 at 12:32 AM, Eric Dumazet  wrote:
> On Fri, 2013-11-22 at 23:00 +0800, Yanfei Wang wrote:
>> Hi,
>>
>> Compared to the earlier kernel version, I found that Large receive
>> offload(LRO) is obsolete in latest kernel, and that is verified from
>> other materials about Generic receive offload(GRO).
>>
>> However,  I also found some LRO header files and  lro-related
>> functions referred by some drivers, and the lro-related function
>> definition is already removed at the same time!
>>
>> TEST steps:
>> 1. my git local db config as follow, url =
>> http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> 2. git checkout v3.12 -b 3.12
>> 3. cd net/ipv4,
>>  [backyes@f14 ipv4]$ cat Makefile | grep lro
>> obj-$(CONFIG_INET_LRO) += inet_lro.o
>> [backyes@f14 ipv4]$ ls inet_*
>> inet_connection_sock.c  inet_diag.c  inet_fragment.c
>> inet_hashtables.c  inet_timewait_sock.c
>> [backyes@f14 ipv4]$
>> Here, Makefile has inet_lro.o config, while the inet_lro.c is removed.
>> 4. lro functions referred by some drivers, as follow,
>> [backyes@f14 linux]$ grep "inet_lro" * -r
>> drivers/net/ethernet/pasemi/pasemi_mac.c:#include 
>> drivers/net/ethernet/pasemi/pasemi_mac_ethtool.c:#include 
>> drivers/infiniband/hw/nes/nes_hw.c:#include 
>> drivers/infiniband/hw/nes/nes_hw.h:#include 
>> include/linux/inet_lro.h: *  linux/include/linux/inet_lro.h
>> net/ipv4/Makefile:obj-$(CONFIG_INET_LRO) += inet_lro.o
>>
>> so, I understand that the removal of LRO is not clean, which will
>> cause compiler error for some kernel config.
>>
>
> I think you are slightly confused.
>
> LRO is not yet removed.
>
> Feel free to contribute ;)
>
> As long as at least one driver still uses LRO, we can not remove LRO.
>
> Converting a driver means testing the new driver, and owning the
> hardware.
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Davidlohr Bueso
On Fri, 2013-11-22 at 17:25 -0800, Linus Torvalds wrote:
> On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso  wrote:
> > In futex_wake() there is clearly no point in taking the hb->lock if
> > we know beforehand that there are no tasks to be woken. This comes
> > at the smaller cost of doing some atomic operations to keep track of
> > the list's size.
> 
> Hmm. Why? Afaik, you only care about "empty or not". And if you don't
> need the serialization from locking, then afaik you can just do a
> "plist_head_empty()" without holding the lock.

I remember this being the original approach, but after noticing some
strange behavior we quickly decided it wasn't the path. And sure enough,
I just double checked and tried the patch without atomic ops and can see
things being off: one of the futextest performance cases is stuck
blocked on a futex and I couldn't reboot the machine either -- nothing
apparent in dmesg, just not 100% functional. The thing is, we can only
avoid taking the lock only if nobody else is trying to add itself to the
list.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Jason Low
On Fri, Nov 22, 2013 at 5:25 PM, Linus Torvalds
 wrote:
> On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso  wrote:
>> In futex_wake() there is clearly no point in taking the hb->lock if
>> we know beforehand that there are no tasks to be woken. This comes
>> at the smaller cost of doing some atomic operations to keep track of
>> the list's size.
>
> Hmm. Why? Afaik, you only care about "empty or not". And if you don't
> need the serialization from locking, then afaik you can just do a
> "plist_head_empty()" without holding the lock.
>
> NOTE!
>
> The "list_empty()" function is very much designed to work even without
> holding a lock (as long as the head itself exists reliably, of course)
> BUT you have to then guarantee yourself that your algorithm doesn't
> have any races wrt other CPU's adding an entry to the list at the same
> time. Not holding a lock obviously means that you are not serialized
> against that.. We've had problems with people doing
>
> if (!list_empty(waiters))
> wake_up_list(..)
>
> because they wouldn't wake people up who just got added.
>
> But considering that your atomic counter checking has the same lack of
> serialization, at least the plist_head_empty() check shouldn't be any
> worse than that counter thing.. And doesn't need any steenking atomic
> ops or a new counter field.

Hi Linus,

In this patch, since we do the atomic increments before holding the
hb->lock during situations where we may potentially add a task to the
list, then we would only skip attempting the wake up if no other
thread has already held the hb->lock and is in the process of adding a
task to the list, in addition to the list being empty. Would this be
enough to protect it from any race condition?

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [f2fs-dev] [PATCH V3 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

2013-11-22 Thread Chao Yu
Hi,

> -Original Message-
> From: Jaegeuk Kim [mailto:jaegeuk@samsung.com]
> Sent: Friday, November 22, 2013 5:11 PM
> To: Chao Yu
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net; 谭姝
> Subject: Re: [f2fs-dev] [PATCH V3 2/2] f2fs: read contiguous sit entry pages 
> by merging for mount performance
> 
> Hi,
> 
> Just one niptick.
> 
> 2013-11-22 (금), 09:09 +0800, Chao Yu:
> > Previously we read sit entries page one by one, this method lost the chance
> > of reading contiguous page together. So we read pages as contiguous as
> > possible for better mount performance.
> >
> > change log:
> >  o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng
> >suggested.
> >  o add mark_page_accessed() before release page to delay VM reclaiming.
> >  o remove '*order' for simplification of function as Jaegeuk Kim suggested.
> >
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/segment.c |  103 
> > +++--
> >  fs/f2fs/segment.h |2 ++
> >  2 files changed, 78 insertions(+), 27 deletions(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 8149eba..998e7d3 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "f2fs.h"
> >  #include "segment.h"
> > @@ -1488,41 +1489,89 @@ static int build_curseg(struct f2fs_sb_info *sbi)
> > return restore_curseg_summaries(sbi);
> >  }
> >
> > +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start, int nrpages)
> > +{
> > +   struct address_space *mapping = sbi->meta_inode->i_mapping;
> > +   struct page *page;
> > +   block_t blk_addr, prev_blk_addr = 0;
> > +   int sit_blk_cnt = SIT_BLK_CNT(sbi);
> 
> Can we remove SIT_BLK_CNT by replacing with sit_i->sit_blocks?

Because sit_i->sit_blocks is a multiple of segment size(i.e. 512/1024),
but we may use part of sit_blocks, not all of them. So it's waste to read 
all of them.
What's your opinion?

Regards,
Yu

> 
> --
> Jaegeuk Kim
> Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.13-rc1 is out

2013-11-22 Thread Jongman Heo
James Cloos  jhcloos.com> writes:

> 
> This combination:
> 
> # CONFIG_SYSTEM_TRUSTED_KEYRING is not set
> CONFIG_TRUSTED_KEYS=m
> 
> (acquired by oldconfig and N to system keyring)
> 
> fails with:
> 
> Pass 2
>   CC [M]  crypto/asymmetric_keys/x509_rsakey-asn1.o
>   CC [M]  crypto/asymmetric_keys/x509_cert_parser.o
>   CC [M]  crypto/asymmetric_keys/x509_public_key.o
> crypto/asymmetric_keys/x509_public_key.c: In 
function ‘x509_key_preparse’:
> crypto/asymmetric_keys/x509_public_key.c:237:35: 
error: ‘system_trusted_keyring’
> undeclared (first use in this function)
>ret = x509_validate_trust(cert, system_trusted_keyring);
>^
> crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared 
identifier is reported
> only once for each function it appears in
> make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1
> make[1]: *** [crypto/asymmetric_keys] Error 2
> make: *** [crypto] Error 2
> 
> Perhaps include/keys/system_keyring.h should have a definition for
> system_trusted_keyring in the #ifndef CONFIG_SYSTEM_TRUSTED_KEYRING
> case (which may entail just removing the #ifdef).
> 
> Commits b56e5a17b6b9acd1 and 09fbc47373826d67 are relevant.
> 
> -JimC

I have same problem too.

Using following band-aid patch (though I'm not sure it's correct), build 
is ok;

--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -67,6 +67,7 @@ int integrity_digsig_verify(const unsigned int id, const 
char *sig, int siglen,
return -EOPNOTSUPP;
 }
 
+#ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
 int integrity_init_keyring(const unsigned int id)
 {
const struct cred *cred = current_cred();
@@ -84,3 +85,4 @@ int integrity_init_keyring(const unsigned int id)
keyring_name[id], PTR_ERR(keyring[id]));
return 0;
 }
+#endif

But, I encounter another build issue;

  CC  crypto/asymmetric_keys/x509_public_key.o
crypto/asymmetric_keys/x509_public_key.c: In function 
‘x509_key_preparse’:
crypto/asymmetric_keys/x509_public_key.c:237:35: error: 
‘system_trusted_keyring’ undeclared (first use in this function)
   ret = x509_validate_trust(cert, system_trusted_keyring);
   ^
crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared 
identifier is reported only once for each function it appears in
make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1
make[1]: *** [crypto/asymmetric_keys] Error 2
make: *** [crypto] Error 2


Looks like it's caused by following combination...

# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
CONFIG_X509_CERTIFICATE_PARSER=y


Jongman Heo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] mm/zswap: change zswap to writethrough cache

2013-11-22 Thread Weijie Yang
Hello Dan,

On Sat, Nov 23, 2013 at 6:10 AM, Dan Streetman  wrote:
> Currently, zswap is writeback cache; stored pages are not sent
> to swap disk, and when zswap wants to evict old pages it must
> first write them back to swap cache/disk manually.  This avoids
> swap out disk I/O up front, but only moves that disk I/O to
> the writeback case (for pages that are evicted), and adds the
> overhead of having to uncompress the evicted pages, and adds the
> need for an additional free page (to store the uncompressed page)
> at a time of likely high memory pressure.  Additionally, being
> writeback adds complexity to zswap by having to perform the
> writeback on page eviction.
>
> This changes zswap to writethrough cache by enabling
> frontswap_writethrough() before registering, so that any
> successful page store will also be written to swap disk.  All the
> writeback code is removed since it is no longer needed, and the
> only operation during a page eviction is now to remove the entry
> from the tree and free it.

I agree with Seth, It is not good to embedded device.
May be we can find its place in others like server.
I guess it is good to medium workload when swap io is not frequent.

My suggestion is would you please make it configurable so that user
can choice to use writethrough or writeback mode?

Regards,

> Signed-off-by: Dan Streetman 
> ---
>
> I still owe the results of testing pre-patch and post-patch;
> I don't think I'll have SPECjbb results today, but I do have a
> small test program I can send with results of nozswap,
> zswap writeback, and zswap writethrough; I'll send the SPECjbb
> results also when I get them.
>
> Changes since v2:
> change the evict function to remove the entry from the tree,
> and avoid checking if the entry is in use by load.  It is ok
> to return success for the eviction even if load is using the entry
> as zbud does not use the evict function return value in any significant
> way (I'll submit a future patch to remove the return value entirely
> from the evict function).
>
> Changes since v1:
> update to apply to latest -tip, previous patch missed several recent
> zswap patches.
>
>
>  mm/zswap.c | 209 
> ++---
>  1 file changed, 19 insertions(+), 190 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index e55bab9..fc35a7a 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -39,7 +39,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>
>  /*
> @@ -59,8 +58,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>
>  /* Pool limit was hit (see zswap_max_pool_percent) */
>  static u64 zswap_pool_limit_hit;
> -/* Pages written back when pool limit was reached */
> -static u64 zswap_written_back_pages;
> +/* Pages evicted when pool limit was reached */
> +static u64 zswap_evicted_pages;
>  /* Store failed due to a reclaim failure after pool limit was reached */
>  static u64 zswap_reject_reclaim_fail;
>  /* Compressed page was too big for the allocator to (optimally) store */
> @@ -160,7 +159,7 @@ static void zswap_comp_exit(void)
>   * rbnode - links the entry into red-black tree for the appropriate swap type
>   * refcount - the number of outstanding reference to the entry. This is 
> needed
>   *to protect against premature freeing of the entry by code
> - *concurent calls to load, invalidate, and writeback.  The lock
> + *concurent calls to load, invalidate, and evict.  The lock
>   *for the zswap_tree structure that contains the entry must
>   *be held while changing the refcount.  Since the lock must
>   *be held, there is no reason to also make refcount atomic.
> @@ -412,132 +411,19 @@ static bool zswap_is_full(void)
>  }
>
>  /*
> -* writeback code
> +* evict
>  **/
> -/* return enum for zswap_get_swap_cache_page */
> -enum zswap_get_swap_ret {
> -   ZSWAP_SWAPCACHE_NEW,
> -   ZSWAP_SWAPCACHE_EXIST,
> -   ZSWAP_SWAPCACHE_FAIL,
> -};
>
>  /*
> - * zswap_get_swap_cache_page
> - *
> - * This is an adaption of read_swap_cache_async()
> - *
> - * This function tries to find a page with the given swap entry
> - * in the swapper_space address space (the swap cache).  If the page
> - * is found, it is returned in retpage.  Otherwise, a page is allocated,
> - * added to the swap cache, and returned in retpage.
> - *
> - * If success, the swap cache page is returned in retpage
> - * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
> - * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated,
> - * the new page is added to swapcache and locked
> - * Returns ZSWAP_SWAPCACHE_FAIL on error
> + * This is called from zbud to remove an entry that is being evicted.
>   */
> -static int zswap_get_swap_cache_page(swp_entry_t entry,
> -   struct page **retpage)
> -{
> - 

Re: [PATCH] mm/zswap: reverse zswap_entry tree/refcount relationship

2013-11-22 Thread Weijie Yang
Hello Dan,

On Sat, Nov 23, 2013 at 6:10 AM, Dan Streetman  wrote:
> Currently, zswap_entry_put removes the entry from its tree if
> the resulting refcount is 0.  Several places in code put an
> entry's initial reference, but they also must remove the entry
> from its tree first, which makes the tree removal in zswap_entry_put
> redundant.
>
> I believe this has the refcount model backwards - the initial
> refcount reference shouldn't be managed by multiple different places
> in code, and the put function shouldn't be removing the entry
> from the tree.  I think the correct model is for the tree to be
> the owner of the initial entry reference.  This way, the only time
> any code needs to put the entry is if it's also done a get previously.
> The various places in code that remove the entry from the tree simply
> do that, and the zswap_rb_erase function does the put of the initial
> reference.
>
> This patch moves the initial referencing completely into the tree
> functions - zswap_rb_insert gets the entry, while zswap_rb_erase
> puts the entry.  The zswap_entry_get/put functions are still available
> for any code that needs to use an entry outside of the tree lock.
> Also, the zswap_entry_find_get function is renamed to zswap_rb_search_get
> since the function behavior and return value is closer to zswap_rb_search
> than zswap_entry_get.  All code that previously removed the entry from
> the tree and put it now only remove the entry from the tree.
>
> The comment headers for most of the tree insert/search/erase functions
> and the get/put functions are updated to clarify if the tree lock
> needs to be held as well as when the caller needs to get/put an
> entry (i.e. iff the caller is using the entry outside the tree lock).

I do not like this patch idea, It breaks the zswap_rb_xxx() purity.
I think zswap_rb_xxx() should only focus on rbtree operations.

The current code might be redundant, but its logic is clear.
So it is not essential need to be changed.

If I miss something, please let me know.

Regards,

> Signed-off-by: Dan Streetman 
> ---
>
> This patch requires the writethrough patch to have been applied, but
> the patch idea doesn't require the writethrough patch.
>
>  mm/zswap.c | 130 
> ++---
>  1 file changed, 72 insertions(+), 58 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index fc35a7a..8c27eb2 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -215,7 +215,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t 
> gfp)
> entry = kmem_cache_alloc(zswap_entry_cache, gfp);
> if (!entry)
> return NULL;
> -   entry->refcount = 1;
> +   entry->refcount = 0;
> RB_CLEAR_NODE(>rbnode);
> return entry;
>  }
> @@ -228,9 +228,51 @@ static void zswap_entry_cache_free(struct zswap_entry 
> *entry)
>  /*
>  * rbtree functions
>  **/
> -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t 
> offset)
> +
> +/*
> + * Carries out the common pattern of freeing and entry's zsmalloc allocation,
> + * freeing the entry itself, and decrementing the number of stored pages.
> + */
> +static void zswap_free_entry(struct zswap_tree *tree,
> +   struct zswap_entry *entry)
> +{
> +   zbud_free(tree->pool, entry->handle);
> +   zswap_entry_cache_free(entry);
> +   atomic_dec(_stored_pages);
> +   zswap_pool_pages = zbud_get_pool_size(tree->pool);
> +}
> +
> +/* caller must hold the tree lock
> + * this must be used if the entry will be used outside
> + * the tree lock
> + */
> +static void zswap_entry_get(struct zswap_entry *entry)
> +{
> +   entry->refcount++;
> +}
> +
> +/* caller must hold the tree lock
> +* remove from the tree and free it, if nobody reference the entry
> +*/
> +static void zswap_entry_put(struct zswap_tree *tree,
> +   struct zswap_entry *entry)
> +{
> +   int refcount = --entry->refcount;
> +
> +   BUG_ON(refcount < 0);
> +   if (refcount == 0)
> +   zswap_free_entry(tree, entry);
> +}
> +
> +/* caller much hold the tree lock
> + * This will find the entry for the offset, and return it
> + * If no entry is found, NULL is returned
> + * If the entry will be used outside the tree lock,
> + * then zswap_rb_search_get should be used instead
> + */
> +static struct zswap_entry *zswap_rb_search(struct zswap_tree *tree, pgoff_t 
> offset)
>  {
> -   struct rb_node *node = root->rb_node;
> +   struct rb_node *node = tree->rbroot.rb_node;
> struct zswap_entry *entry;
>
> while (node) {
> @@ -246,13 +288,14 @@ static struct zswap_entry *zswap_rb_search(struct 
> rb_root *root, pgoff_t offset)
>  }
>
>  /*
> + * caller must hold the tree lock
>   * In the case that a entry with the same offset is found, a pointer to
>   * the existing entry is stored in dupentry and the function returns -EEXIST

(no subject)

2013-11-22 Thread SANTANDER ASSET FINANCE PLC



-- 
Do you need a Loan to start or expand your business ? We give Business
Loans upto £100 million if you need funding please send sumary of your
Project to our email below: krisparker.loanage...@aim.com
Dr. Kris Parker
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH resend] slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings

2013-11-22 Thread Randy Dunlap
From: Randy Dunlap 

Fix kernel-doc warning for duplicate definition of 'kmalloc':

Documentation/DocBook/kernel-api.xml:9483: element refentry: validity error : 
ID API-kmalloc already defined


Also combine the kernel-doc info from the 2 kmalloc definitions into one
block and remove the "see kcalloc" comment since kmalloc now contains the
@flags info.

Signed-off-by: Randy Dunlap 
Acked-by: Christoph Lameter 
---
 include/linux/slab.h |  102 ++---
 1 file changed, 46 insertions(+), 56 deletions(-)

--- lnx-312-rc1.orig/include/linux/slab.h
+++ lnx-312-rc1/include/linux/slab.h
@@ -381,10 +381,55 @@ static __always_inline void *kmalloc_lar
 /**
  * kmalloc - allocate memory
  * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kcalloc).
+ * @flags: the type of memory to allocate.
  *
  * kmalloc is the normal method of allocating memory
  * for objects smaller than page size in the kernel.
+ *
+ * The @flags argument may be one of:
+ *
+ * %GFP_USER - Allocate memory on behalf of user.  May sleep.
+ *
+ * %GFP_KERNEL - Allocate normal kernel ram.  May sleep.
+ *
+ * %GFP_ATOMIC - Allocation will not sleep.  May use emergency pools.
+ *   For example, use this inside interrupt handlers.
+ *
+ * %GFP_HIGHUSER - Allocate pages from high memory.
+ *
+ * %GFP_NOIO - Do not do any I/O at all while trying to get memory.
+ *
+ * %GFP_NOFS - Do not make any fs calls while trying to get memory.
+ *
+ * %GFP_NOWAIT - Allocation will not sleep.
+ *
+ * %GFP_THISNODE - Allocate node-local memory only.
+ *
+ * %GFP_DMA - Allocation suitable for DMA.
+ *   Should only be used for kmalloc() caches. Otherwise, use a
+ *   slab created with SLAB_DMA.
+ *
+ * Also it is possible to set different flags by OR'ing
+ * in one or more of the following additional @flags:
+ *
+ * %__GFP_COLD - Request cache-cold pages instead of
+ *   trying to return cache-warm pages.
+ *
+ * %__GFP_HIGH - This allocation has high priority and may use emergency pools.
+ *
+ * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
+ *   (think twice before using).
+ *
+ * %__GFP_NORETRY - If memory is not immediately available,
+ *   then give up at once.
+ *
+ * %__GFP_NOWARN - If allocation fails, don't issue any warnings.
+ *
+ * %__GFP_REPEAT - If allocation fails initially, try once more before failing.
+ *
+ * There are other flags available as well, but these are not intended
+ * for general use, and so are not documented here. For a full list of
+ * potential flags, always refer to linux/gfp.h.
  */
 static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
@@ -495,61 +540,6 @@ int cache_show(struct kmem_cache *s, str
 void print_slabinfo_header(struct seq_file *m);
 
 /**
- * kmalloc - allocate memory
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate.
- *
- * The @flags argument may be one of:
- *
- * %GFP_USER - Allocate memory on behalf of user.  May sleep.
- *
- * %GFP_KERNEL - Allocate normal kernel ram.  May sleep.
- *
- * %GFP_ATOMIC - Allocation will not sleep.  May use emergency pools.
- *   For example, use this inside interrupt handlers.
- *
- * %GFP_HIGHUSER - Allocate pages from high memory.
- *
- * %GFP_NOIO - Do not do any I/O at all while trying to get memory.
- *
- * %GFP_NOFS - Do not make any fs calls while trying to get memory.
- *
- * %GFP_NOWAIT - Allocation will not sleep.
- *
- * %GFP_THISNODE - Allocate node-local memory only.
- *
- * %GFP_DMA - Allocation suitable for DMA.
- *   Should only be used for kmalloc() caches. Otherwise, use a
- *   slab created with SLAB_DMA.
- *
- * Also it is possible to set different flags by OR'ing
- * in one or more of the following additional @flags:
- *
- * %__GFP_COLD - Request cache-cold pages instead of
- *   trying to return cache-warm pages.
- *
- * %__GFP_HIGH - This allocation has high priority and may use emergency pools.
- *
- * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
- *   (think twice before using).
- *
- * %__GFP_NORETRY - If memory is not immediately available,
- *   then give up at once.
- *
- * %__GFP_NOWARN - If allocation fails, don't issue any warnings.
- *
- * %__GFP_REPEAT - If allocation fails initially, try once more before failing.
- *
- * There are other flags available as well, but these are not intended
- * for general use, and so are not documented here. For a full list of
- * potential flags, always refer to linux/gfp.h.
- *
- * kmalloc is the normal method of allocating memory
- * in the kernel.
- */
-static __always_inline void *kmalloc(size_t size, gfp_t flags);
-
-/**
  * kmalloc_array - allocate memory for an array.
  * @n: number of elements.
  * @size: element size.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] slab: remove the redundant declaration of kmalloc

2013-11-22 Thread Randy Dunlap
On 11/21/13 17:07, Qiang Huang wrote:
> On 2013/11/22 3:11, Randy Dunlap wrote:
>> On 11/20/13 21:57, Qiang Huang wrote:
>>>
>>> Signed-off-by: Qiang Huang 
>>
>> or use my patch from 2013-09-17:
>> http://marc.info/?l=linux-mm=137944291611467=2
>>
>> Would be nice to one of these merged...
> 
> Yes, sorry for not notice this, merge your patch should be property :)
> But why it's still not be merged?
> 
> Ping...

I don't know.  I'll resend it now.


Thanks.


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


восстановите зрение без трудностей

2013-11-22 Thread netnet61
восстановите зрение без операционным способом http://glazmen.ru/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Tony Lindgren
* Tony Lindgren  [131122 17:16]:
> * Tony Lindgren  [131122 17:09]:
> > * Russell King - ARM Linux  [131122 16:56]:
> > > On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote:
> > > > +   /* See of_device_resource_notify for populating 
> > > > interrupts */
> > > > +   for (i = 0; i < num_irq; i++, res++) {
> > > > +   res->flags = IORESOURCE_IRQ;
> > > > +   res->start = -EPROBE_DEFER;
> > > > +   res->end = -EPROBE_DEFER;
> > > 
> > > NAK.  Definitely a bad idea to start introducing magic values other into
> > > resources.  Please don't do this.
> > 
> > Do you have any better ideas on how to sort out this issue then?
> 
> I guess we could allocate all the resources lazily here, I'll take a look
> at that.

Here's a version that allocates the resources lazily with the notifier.
Seems to boot, need to play with it a bit more though to make sure we're
not overwriting resources for any legacy devices.

Regards,

Tony


--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -141,13 +141,47 @@ struct platform_device *of_device_alloc(struct 
device_node *np,
  struct device *parent)
 {
struct platform_device *dev;
-   int rc, i, num_reg = 0, num_irq;
-   struct resource *res, temp_res;
 
dev = platform_device_alloc("", -1);
if (!dev)
return NULL;
 
+   dev->dev.of_node = of_node_get(np);
+#if defined(CONFIG_MICROBLAZE)
+   dev->dev.dma_mask = >archdata.dma_mask;
+#endif
+   dev->dev.parent = parent;
+
+   if (bus_id)
+   dev_set_name(>dev, "%s", bus_id);
+   else
+   of_device_make_bus_id(>dev);
+
+   /* See of_device_resource_notify for populating the resources */
+
+   return dev;
+}
+EXPORT_SYMBOL(of_device_alloc);
+
+/*
+ * The device interrupts are not necessarily available for all
+ * irqdomains initially so we need to populate them using a
+ * notifier.
+ */
+static int of_device_resource_notify(struct notifier_block *nb,
+unsigned long event, void *dev)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   struct device_node *np = pdev->dev.of_node;
+   int rc, i, num_reg = 0, num_irq;
+   struct resource *res, temp_res;
+
+   if (event != BUS_NOTIFY_BIND_DRIVER)
+   return 0;
+
+   if (!np)
+   goto out;
+
/* count the io and irq resources */
if (of_can_translate_address(np))
while (of_address_to_resource(np, num_reg, _res) == 0)
@@ -158,12 +192,12 @@ struct platform_device *of_device_alloc(struct 
device_node *np,
if (num_irq || num_reg) {
res = kzalloc(sizeof(*res) * (num_irq + num_reg), GFP_KERNEL);
if (!res) {
-   platform_device_put(dev);
-   return NULL;
+   platform_device_put(pdev);
+   goto out;
}
 
-   dev->num_resources = num_reg + num_irq;
-   dev->resource = res;
+   pdev->num_resources = num_reg + num_irq;
+   pdev->resource = res;
for (i = 0; i < num_reg; i++, res++) {
rc = of_address_to_resource(np, i, res);
WARN_ON(rc);
@@ -171,20 +205,9 @@ struct platform_device *of_device_alloc(struct device_node 
*np,
WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq);
}
 
-   dev->dev.of_node = of_node_get(np);
-#if defined(CONFIG_MICROBLAZE)
-   dev->dev.dma_mask = >archdata.dma_mask;
-#endif
-   dev->dev.parent = parent;
-
-   if (bus_id)
-   dev_set_name(>dev, "%s", bus_id);
-   else
-   of_device_make_bus_id(>dev);
-
-   return dev;
+out:
+   return NOTIFY_DONE;
 }
-EXPORT_SYMBOL(of_device_alloc);
 
 /**
  * of_platform_device_create_pdata - Alloc, initialize and register an 
of_device
@@ -447,6 +470,8 @@ int of_platform_bus_probe(struct device_node *root,
 }
 EXPORT_SYMBOL(of_platform_bus_probe);
 
+static struct notifier_block resource_nb;
+
 /**
  * of_platform_populate() - Populate platform_devices from device tree data
  * @root: parent of the first level to probe or NULL for the root of the tree
@@ -478,6 +503,11 @@ int of_platform_populate(struct device_node *root,
if (!root)
return -EINVAL;
 
+   if (!resource_nb.notifier_call) {
+   resource_nb.notifier_call = of_device_resource_notify,
+   bus_register_notifier(_bus_type, _nb);
+   }
+
for_each_child_of_node(root, child) {
rc = of_platform_bus_create(child, matches, lookup, parent, 
true);
if (rc)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: gem csum breakage with 018c5bba (Handle CHECKSUM_COMPLETE more adequately ...)

2013-11-22 Thread Herbert Xu
On Fri, Nov 22, 2013 at 07:08:16PM +0100, Alexander Graf wrote:
> Hi,
> 
> With current Linus' master tree my ibook started emitting a lot of annoying 
> csum calculation warnings. I've bisected it down to the commit seen in the 
> subject line, but I suppose the real problem lies somewhere a lot deeper and 
> only gets revealed thanks to the more clever checksum logic applied now.
> 
> Any ideas how to dig further into this?

Sorry, I don't have any docs on this device.  Perhaps Dave has
the docs which can tell us what RXDCTRL_TCPCSUM actually returns?

I assume it returns the correct checksum ending at the IP
packet boundary and only goes wrong when extra padding exists.

Hmm, I suppose the question then is what exactly does it do
if the packet isn't an IP packet?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crypto Update for 3.13

2013-11-22 Thread Herbert Xu
Hi Linus:

This is a resend with the recipients in the cc field instead of the
to field.

Here is the crypto update for 3.13:

* Made x86 ablk_helper generic for ARM.
* Phase out chainiv in favour of eseqiv (affects IPsec).
* Fixed aes-cbc IV corruption on s390.
* Added constant-time crypto_memneq which replaces memcmp.

* Fixed aes-ctr in omap-aes.
* Added OMAP3 ROM RNG support.
* Add PRNG support for MSM SoC's
* Add and use Job Ring API in caam.

* Misc fixes.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git



Alex Porosanu (7):
  crypto: caam - fix RNG state handle instantiation descriptor
  crypto: caam - fix hash, alg and rng registration if CAAM driver not 
initialized
  crypto: caam - fix RNG4 instantiation
  crypto: caam - split RNG4 instantiation function
  crypto: caam - uninstantiate RNG state handle 0 if instantiated by caam 
driver
  crypto: caam - fix RNG4 AAI defines
  crypto: caam - enable instantiation of all RNG4 state handles

Ard Biesheuvel (2):
  crypto: create generic version of ablk_helper
  crypto: move x86 to the generic version of ablk_helper

Ben Hutchings (1):
  hwrng: via-rng - Mark device ID table as __maybe_unused

Fabio Estevam (4):
  crypto: dcp - Use devm_ioremap_resource()
  crypto: dcp - Use devm_request_irq()
  crypto: dcp - Fix the path for releasing the resources
  crypto: dcp - Check the return value from devm_ioremap_resource()

Herbert Xu (2):
  crypto: skcipher - Use eseqiv even on UP machines
  crypto: s390 - Fix aes-cbc IV corruption

James Yonan (1):
  crypto: crypto_memneq - add equality testing of memory regions w/o timing 
leaks

Joel Fernandes (1):
  crypto: omap-aes - Fix CTR mode counter length

Joni Lapilainen (1):
  crypto: omap-sham - Add missing modalias

Jussi Kivilinna (2):
  crypto: sha256_ssse3 - use correct module alias for sha224
  crypto: x86 - restore avx2_supported check

Linus Walleij (1):
  crypto: tegra - use kernel entropy instead of ad-hoc

Mathias Krause (6):
  crypto: authenc - Export key parsing helper function
  crypto: authencesn - Simplify key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: talitos - Simplify key parsing
  padata: make the sequence counter an atomic_t

Michael Ellerman (2):
  hwrng: pseries - Use KBUILD_MODNAME in pseries-rng.c
  hwrng: pseries - Return errors to upper levels in pseries-rng.c

Michael Opdenacker (1):
  crypto: mv_cesa: remove deprecated IRQF_DISABLED

Neil Horman (1):
  crypto: ansi_cprng - Fix off by one error in non-block size request

Oliver Neukum (1):
  crypto: sha256_ssse3 - also test for BMI2

Pali Rohár (1):
  hwrng: OMAP3 ROM Random Number Generator support

Ruchika Gupta (3):
  crypto: caam - Add Platform driver for Job Ring
  crypto: caam - Add API's to allocate/free Job Rings
  crypto: caam - Modify the interface layers to use JR API's

Sachin Kamat (7):
  crypto: mv_cesa - Staticize local symbols
  crypto: omap-aes - Staticize local symbols
  crypto: tegra-aes - Staticize tegra_aes_cra_exit
  crypto: tegra-aes - Fix NULL pointer dereference
  crypto: tegra-aes - Use devm_clk_get
  crypto: sahara - Remove redundant of_match_ptr
  crypto: mv_cesa - Remove redundant of_match_ptr

Stanimir Varbanov (2):
  ARM: DT: msm: Add Qualcomm's PRNG driver binding document
  hwrng: msm - Add PRNG support for MSM SoC's

Stephen Warren (1):
  ARM: tegra: remove tegra_chip_uid()

Yashpal Dutta (1):
  crypto: caam - map src buffer before access

kbuild test robot (1):
  crypto: ablk_helper - Replace memcpy with struct assignment

 .../devicetree/bindings/rng/qcom,prng.txt  |   17 +
 arch/arm/mach-tegra/fuse.c |   10 -
 arch/s390/crypto/aes_s390.c|   19 +-
 arch/x86/crypto/Makefile   |3 +-
 arch/x86/crypto/aesni-intel_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx_glue.c  |2 +-
 arch/x86/crypto/cast5_avx_glue.c   |2 +-
 arch/x86/crypto/cast6_avx_glue.c   |2 +-
 arch/x86/crypto/serpent_avx2_glue.c|2 +-
 arch/x86/crypto/serpent_avx_glue.c |2 +-
 arch/x86/crypto/serpent_sse2_glue.c|2 +-
 arch/x86/crypto/sha256_ssse3_glue.c|4 +-
 arch/x86/crypto/twofish_avx_glue.c |2 +-
 arch/x86/include/asm/simd.h|   11 +
 crypto/Kconfig |   23 +-
 crypto/Makefile|8 +-
 {arch/x86/crypto => crypto}/ablk_helper.c  |   13 +-
 crypto/ablkcipher.c|   21 +-
 

Crypto Update for 3.13

2013-11-22 Thread Herbert Xu
Hi Linus:

This is a resend.

Here is the crypto update for 3.13:

* Made x86 ablk_helper generic for ARM.
* Phase out chainiv in favour of eseqiv (affects IPsec).
* Fixed aes-cbc IV corruption on s390.
* Added constant-time crypto_memneq which replaces memcmp.

* Fixed aes-ctr in omap-aes.
* Added OMAP3 ROM RNG support.
* Add PRNG support for MSM SoC's
* Add and use Job Ring API in caam.

* Misc fixes.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git



Alex Porosanu (7):
  crypto: caam - fix RNG state handle instantiation descriptor
  crypto: caam - fix hash, alg and rng registration if CAAM driver not 
initialized
  crypto: caam - fix RNG4 instantiation
  crypto: caam - split RNG4 instantiation function
  crypto: caam - uninstantiate RNG state handle 0 if instantiated by caam 
driver
  crypto: caam - fix RNG4 AAI defines
  crypto: caam - enable instantiation of all RNG4 state handles

Ard Biesheuvel (2):
  crypto: create generic version of ablk_helper
  crypto: move x86 to the generic version of ablk_helper

Ben Hutchings (1):
  hwrng: via-rng - Mark device ID table as __maybe_unused

Fabio Estevam (4):
  crypto: dcp - Use devm_ioremap_resource()
  crypto: dcp - Use devm_request_irq()
  crypto: dcp - Fix the path for releasing the resources
  crypto: dcp - Check the return value from devm_ioremap_resource()

Herbert Xu (2):
  crypto: skcipher - Use eseqiv even on UP machines
  crypto: s390 - Fix aes-cbc IV corruption

James Yonan (1):
  crypto: crypto_memneq - add equality testing of memory regions w/o timing 
leaks

Joel Fernandes (1):
  crypto: omap-aes - Fix CTR mode counter length

Joni Lapilainen (1):
  crypto: omap-sham - Add missing modalias

Jussi Kivilinna (2):
  crypto: sha256_ssse3 - use correct module alias for sha224
  crypto: x86 - restore avx2_supported check

Linus Walleij (1):
  crypto: tegra - use kernel entropy instead of ad-hoc

Mathias Krause (6):
  crypto: authenc - Export key parsing helper function
  crypto: authencesn - Simplify key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: talitos - Simplify key parsing
  padata: make the sequence counter an atomic_t

Michael Ellerman (2):
  hwrng: pseries - Use KBUILD_MODNAME in pseries-rng.c
  hwrng: pseries - Return errors to upper levels in pseries-rng.c

Michael Opdenacker (1):
  crypto: mv_cesa: remove deprecated IRQF_DISABLED

Neil Horman (1):
  crypto: ansi_cprng - Fix off by one error in non-block size request

Oliver Neukum (1):
  crypto: sha256_ssse3 - also test for BMI2

Pali Rohár (1):
  hwrng: OMAP3 ROM Random Number Generator support

Ruchika Gupta (3):
  crypto: caam - Add Platform driver for Job Ring
  crypto: caam - Add API's to allocate/free Job Rings
  crypto: caam - Modify the interface layers to use JR API's

Sachin Kamat (7):
  crypto: mv_cesa - Staticize local symbols
  crypto: omap-aes - Staticize local symbols
  crypto: tegra-aes - Staticize tegra_aes_cra_exit
  crypto: tegra-aes - Fix NULL pointer dereference
  crypto: tegra-aes - Use devm_clk_get
  crypto: sahara - Remove redundant of_match_ptr
  crypto: mv_cesa - Remove redundant of_match_ptr

Stanimir Varbanov (2):
  ARM: DT: msm: Add Qualcomm's PRNG driver binding document
  hwrng: msm - Add PRNG support for MSM SoC's

Stephen Warren (1):
  ARM: tegra: remove tegra_chip_uid()

Yashpal Dutta (1):
  crypto: caam - map src buffer before access

kbuild test robot (1):
  crypto: ablk_helper - Replace memcpy with struct assignment

 .../devicetree/bindings/rng/qcom,prng.txt  |   17 +
 arch/arm/mach-tegra/fuse.c |   10 -
 arch/s390/crypto/aes_s390.c|   19 +-
 arch/x86/crypto/Makefile   |3 +-
 arch/x86/crypto/aesni-intel_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx_glue.c  |2 +-
 arch/x86/crypto/cast5_avx_glue.c   |2 +-
 arch/x86/crypto/cast6_avx_glue.c   |2 +-
 arch/x86/crypto/serpent_avx2_glue.c|2 +-
 arch/x86/crypto/serpent_avx_glue.c |2 +-
 arch/x86/crypto/serpent_sse2_glue.c|2 +-
 arch/x86/crypto/sha256_ssse3_glue.c|4 +-
 arch/x86/crypto/twofish_avx_glue.c |2 +-
 arch/x86/include/asm/simd.h|   11 +
 crypto/Kconfig |   23 +-
 crypto/Makefile|8 +-
 {arch/x86/crypto => crypto}/ablk_helper.c  |   13 +-
 crypto/ablkcipher.c|   21 +-
 crypto/ansi_cprng.c|4 +-
 

[GIT PULL] NTB bug fixes and features for v3.13

2013-11-22 Thread Jon Mason
Hi Linus,
Below are a number of NTB bug fixes and features for v3.13.  Please
consider pulling them.

Thanks,
Jon

The following changes since commit 5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52:

  Linux 3.12 (2013-11-03 15:41:51 -0800)

are available in the git repository at:

  git://github.com/jonmason/ntb.git tags/ntb-3.13

for you to fetch changes up to e8aeb60c389c2aa48d345bcf717f8cb7edf67680:

  NTB: Disable interrupts and poll under high load (2013-11-20 09:57:34 -0700)


NTB driver bug fixes to address a missed call to pci_enable_msix,
NTB-RP Link Up issue, Xeon Doorbell errata workaround, ntb_transport
link down race, and correct dmaengine_get/put usage.  Also, clean-ups
to remove duplicate defines and document a hardware errata.  Finally,
some changes to improve performance.


Alexander Gordeev (1):
  ntb: Fix missed call to pci_enable_msix()

Jon Mason (7):
  NTB: Xeon Doorbell errata workaround
  NTB: Fix NTB-RP Link Up
  NTB: Fix ntb_transport link down race
  NTB: correct dmaengine_get/put usage
  NTB: Document HW errata
  NTB: Enable Snoop on Primary Side
  NTB: Disable interrupts and poll under high load

Michael Opdenacker (1):
  NTB: remove duplicate defines

 drivers/ntb/ntb_hw.c|  121 ---
 drivers/ntb/ntb_hw.h|7 +--
 drivers/ntb/ntb_regs.h  |   16 +++---
 drivers/ntb/ntb_transport.c |   77 ++-
 4 files changed, 156 insertions(+), 65 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] clocksource/cadence_ttc: Overhaul clocksource frequency adjustment

2013-11-22 Thread Sören Brinkmann
On Tue, Nov 12, 2013 at 08:01:37PM +0100, Daniel Lezcano wrote:
> On 11/08/2013 10:21 PM, Soren Brinkmann wrote:
> >The currently used method adjusting the clocksource to a changing input
> >frequency does not work on kernels from 3.11 on.
> >The new approach is to keep the timer frequency as constant as possible.
> >I.e.
> >  - due to the TTC's prescaler limitations, allow frequency changes
> >only if the frequency scales by a power of 2
> >  - adjust the counter's divider on the fly when a frequency change
> >occurs
> >
> >When suspending though, the driver should not prevent rate changes in
> >order to allow the system to enter its low power state. For that
> >reason a PM notifier is added so rate changes can be ignored during
> >suspend/resume.
> 
> It sounds very weird you have to add a PM notifier in this driver.
> 
> Have you been facing an issue or do you assume it could happen ?

As I said in my other email, we do have some issues related with
suspend. The construct with the PM notifier seemed to work but since we
upgraded our vendor tree to 3.12 I see issues with this code.

So, for now, I'll remove the PM notifier and associated code from this
patch.
I have to revisit the suspend case later.

Thanks,
Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Linus Torvalds
On Fri, Nov 22, 2013 at 4:56 PM, Davidlohr Bueso  wrote:
> In futex_wake() there is clearly no point in taking the hb->lock if
> we know beforehand that there are no tasks to be woken. This comes
> at the smaller cost of doing some atomic operations to keep track of
> the list's size.

Hmm. Why? Afaik, you only care about "empty or not". And if you don't
need the serialization from locking, then afaik you can just do a
"plist_head_empty()" without holding the lock.

NOTE!

The "list_empty()" function is very much designed to work even without
holding a lock (as long as the head itself exists reliably, of course)
BUT you have to then guarantee yourself that your algorithm doesn't
have any races wrt other CPU's adding an entry to the list at the same
time. Not holding a lock obviously means that you are not serialized
against that.. We've had problems with people doing

if (!list_empty(waiters))
wake_up_list(..)

because they wouldn't wake people up who just got added.

But considering that your atomic counter checking has the same lack of
serialization, at least the plist_head_empty() check shouldn't be any
worse than that counter thing.. And doesn't need any steenking atomic
ops or a new counter field.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8] ARM: OMAP2+: timer: Introduce OF-friendly clocksource/clockevent system timers

2013-11-22 Thread Joel Fernandes
Sorry, resending due to an MUA issue.

Adding Thomas to the thread since discussion is about clocksource, and Mark
Rutland as discussion is related to timers and DT, thanks.

On 11/22/2013 02:01 PM, Rob Herring wrote:
> On Fri, Nov 22, 2013 at 10:42 AM, Joel Fernandes  wrote:
>> On 11/22/2013 09:58 AM, Rob Herring wrote:
>>> On Thu, Nov 21, 2013 at 7:56 PM, Joel Fernandes  wrote:
 This work is a migration effort of OMAP system timers to the
 clocksource/clockevent framework. Consider this as a first-pass in this 
 effort.
 There are few cleanups that need to be done first. The HWMOD code is
 intertwined with the timer code. HWMOD code cleanups in the future will
 hopefully make most of this code go away, so till then we separate out the
 power/clocks portion of the code from the actual timer bits.  This will
 facilitate near-future work of adapting the system timer as a clocksource.

 New functions for OF-only boot are introduced, and we can soon delete the 
 old
 versions once we migrate all platforms. Currently only AM335x is migrated 
 and
 testedA new omap_generic_timer_init function is introduced for DT 
 platforms.
 Code required earlier for non-DT platforms such as setup of timer IDs and 
 timer
 parent clock is not required.  parent clocks are automatically setup by 
 the mux
 clock driver through DT so they no longer need to be hardcoded.

 The init code will try to pick the best timer for clocksource and 
 clockevent
 however bindings are added to force a particular timer as clocksource or
 clockevent through DT.

 Signed-off-by: Joel Fernandes 
 ---
  .../devicetree/bindings/arm/omap/timer.txt |  12 ++
  arch/arm/mach-omap2/common.h   |   1 +
  arch/arm/mach-omap2/timer.c| 235 
 +
  3 files changed, 248 insertions(+)

 diff --git a/Documentation/devicetree/bindings/arm/omap/timer.txt 
 b/Documentation/devicetree/bindings/arm/omap/timer.txt
 index d02e27c..6cf7a75 100644
 --- a/Documentation/devicetree/bindings/arm/omap/timer.txt
 +++ b/Documentation/devicetree/bindings/arm/omap/timer.txt
 @@ -32,6 +32,18 @@ Optional properties:
  - ti,timer-secure: Indicates the timer is reserved on a secure OMAP 
 device
 and therefore cannot be used by the kernel.

 +- ti,timer-clockevent,
 +  ti,timer-clocksource These properties force the system timer code to 
 choose
 +   the particular timer as a clockevent or 
 clocksource.
 +   If these properties are not specified, the timer 
 code
 +   picks up a "ti,timer-alwon" as the clocksource and 
 a
 +   timer containing one of the following properties as
 +   the clockevent in the following order:
 +   ti,timer-alwon
 +   ti,timer-dsp
 +   ti,timer-pwm
 +   ti,timer-secure
>>>
>>> These properties were added specifically for the reason of avoiding
>>> linux specific properties like these. When is this not sufficient?
>>
>> Some platforms cannot use certain timers as clockevents and clocksource, to 
>> keep
>> this code functionally equivalent and working, I added these properties so 
>> that
>> its possible to select specific timers as clockevents/sources as was being 
>> done
>> in the non-DT case. I'm open to suggestions for doing this in a better way 
>> for DT.
> 
> There has to be a defined reason why a given timer cannot be used. You
> are not explaining what that reason is. Define a property or set of
> properties that describe the h/w feature or quirk.

Reason? There are HW bugs that prevent a timer from being used. See [1]. And
there may be other reasons, I haven't written this code but I know that there
are other HW bugs that made authors pick a specific timer such as board issues
or accuracy.

This is nothing new, just that I'm trying to find a way to do it from DT.

You can see the ifdef'ry below in mach-omap2/timer.c. All I'm trying to do is
make it simpler and cleaner by adding these properties to DT so that we can
delete this code entirely.


#ifdef CONFIG_ARCH_OMAP2
OMAP_SYS_32K_TIMER_INIT(2, 1, "timer_32k_ck", "ti,timer-alwon",
2, "timer_sys_ck", NULL);
#endif /* CONFIG_ARCH_OMAP2 */

#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM43XX)
OMAP_SYS_32K_TIMER_INIT(3, 1, "timer_32k_ck", "ti,timer-alwon",
2, "timer_sys_ck", NULL);
OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure",
2, "timer_sys_ck", NULL);
#endif /* CONFIG_ARCH_OMAP3 */

etc...

>>>
>>> And I agree with the comment to use OF_CLKSRC.
>>>
>>
>> 

Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation

2013-11-22 Thread Joseph S. Myers
On Fri, 22 Nov 2013, Scott Wood wrote:

> This sounds like an incompatible change to userspace API.  What about
> older glibc?  What about user code that directly manipulates these bits
> rather than going through libc, or uses a libc other than glibc?  Where
> is this API requirement documented?

The previous EGLIBC port, and the uClibc code copied from it, is 
fundamentally broken as regards any use of prctl for floating-point 
exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl 
calls (and did various worse things, such as passing a pointer when prctl 
expected an integer).  If you avoid anything where prctl is used, the 
clearing of sticky bits still means it will never give anything 
approximating correct exception semantics with existing kernels.  I don't 
believe the patch makes things any worse for existing code that doesn't 
try to inform the kernel of changes to sticky bits - such code may get 
incorrect exceptions in some cases, but it would have done so anyway in 
other cases.

This is the best API I could come up with to fix the fundamentally broken 
nature of what came before, taking into account that in many cases a prctl 
call is already needed along with userspace manipulation of exception 
bits.  I'm not aware of any kernel documentation where this sort of 
subarchitecture-specific API detail is documented.  (The API also includes 
such things as needing to leave the spefscr trap-enable bits set and use 
prctl to control whether SIGFPE results from exceptions.)

> I think the impact of this could be reduced by using this mechanism only
> to clear bits, rather than set them.  That is, if the exception bit is
> unset, don't set it just because it's set in spefscr_last -- but if it's
> not set in spefscr_last, and the emulation code doesn't want to set it,
> then clear it.

It should already be the case in this patch that if a bit is clear in 
spefscr, and set in spefscr_last (i.e. userspace did not inform the kernel 
of clearing the bit, and no traps since then have resulted in the kernel 
noticing it was cleared), it won't get set unless the emulation code wants 
to set it.  The sole place spefscr_last is read is in the statement 
"__FPU_FPSCR &= ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | 
current->thread.spefscr_last;" - if the bit is already clear in spefscr, 
this statement has no effect on it.

> Are there any cases where the exception bit can be set without the
> kernel taking a trap, or is userspace manipulation limited to clearing
> the bits?

Userspace can both set and clear the bits without a trap.  For example, 
fesetenv restores a saved value of spefscr which may both set and clear 
bits (and then it calls prctl because it needs to do so anyway to restore 
the saved state for which exceptions were enabled).  fesetexceptflag 
restores saved state of particular exceptions without a trap (so needs to 
call prctl specially to inform the kernel of a change).

-- 
Joseph S. Myers
jos...@codesourcery.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] iproute2 3.12.0 release

2013-11-22 Thread Stephen Hemminger
A little late but ready and toasty warm here is iproute2 to go with
3.12.0 (aka One Giant Leap for Frogkind).

In addition to the usual build  documentation fixes, this
version includes support for ipv6 on vxlan and GRE.
As well as fair queue packet scheduler.

If you have been sitting on changes to iproute2 that are in
net-next for 3.12 merge window, please submit them now.

Iproute2 package is available at:
  http://kernel.org/pub/linux/utils/net/iproute2/iproute2-3.12.0.tar.gz

You can download the source from:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

Stay Warm!

---
Andreas Henriksson (1):
  ss: avoid passing negative numbers to malloc

Christophe Gouault (1):
  xfrm: enable to set non-wildcard mark 0 on SAs and SPs

Eric Dumazet (3):
  pkt_sched: fq: Fair Queue packet scheduler
  tc: support TCA_STATS_RATE_EST64
  htb: add support for direct_qlen attribute

Fan Du (1):
  xfrm: use memcpy to suppress gcc phony buffer overflow warning.

Hangbin Liu (1):
  ipaddrlabel: use uint32_t instead of int32_t

Jamal Hadi Salim (2):
  tc: introduce simple action
  action: typo nat fix

Nicolas Dichtel (1):
  iplink: update available type list

Nigel Kukard (1):
  Fix tc stats when using -batch mode

Petr Písař (2):
  iproute2: bridge: document mdb
  iproute2: bridge: Close file with bridge monitor file

Sami Kerola (1):
  ip: make -resolve addr to print names rather than addresses

Stefan Tomanek (1):
  ip rule: add route suppression options

Stephen Hemminger (14):
  Update kernel headers to net-next for 3.12
  Update to 3.11 net-next kernel headers
  nstat: add json output format
  Update to 3.12-rc1 headers
  nstat: revise json output
  ifstat: add json output format
  lnstat: add json output format
  lnstat, nstat, ifstat: update man pages
  tc: add default action to kernel headers
  ipv6 gre: add entry to ether types
  Fix handling of qdis without options
  htb: remove old unused duplicate qdisc name
  update kernel headers
  v3.12.0

WANG Cong (1):
  vxlan: add ipv6 support

x...@mail.ru (2):
  iproute2: GRE over IPv6 tunnel support.
  iproute2: ip6gre: update man pages

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Tony Lindgren
* Tony Lindgren  [131122 17:09]:
> * Russell King - ARM Linux  [131122 16:56]:
> > On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote:
> > > + /* See of_device_resource_notify for populating interrupts */
> > > + for (i = 0; i < num_irq; i++, res++) {
> > > + res->flags = IORESOURCE_IRQ;
> > > + res->start = -EPROBE_DEFER;
> > > + res->end = -EPROBE_DEFER;
> > 
> > NAK.  Definitely a bad idea to start introducing magic values other into
> > resources.  Please don't do this.
> 
> Do you have any better ideas on how to sort out this issue then?

I guess we could allocate all the resources lazily here, I'll take a look
at that.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] double free in decompressor.c

2013-11-22 Thread Geyslan Gregório Bem
2013/11/22 Phillip Lougher :
> On 22/11/13 21:50, Geyslan Gregório Bem wrote:
>>
>> Coverity caught double free possibility (CID 1130962).
>>
>> I can patch this, but I have to know if is correct to free comp_opts
>> in the function squashfs_decompressor_create() or it had to be done in
>> the caller. My bet is the caller.
>>
>>
>> 128void *squashfs_decompressor_setup(struct super_block *sb, unsigned
>> short flags)
>> 129{
>> 130struct squashfs_sb_info *msblk = sb->s_fs_info;
>> 131void *stream, *comp_opts = get_comp_opts(sb, flags);
>> 132
>>
>> 1. Condition "IS_ERR(comp_opts)", taking false branch
>> 133if (IS_ERR(comp_opts))
>> 134return comp_opts;
>> 135
>>
>> 2. freed_arg: "squashfs_decompressor_create(struct squashfs_sb_info *,
>> void *)" frees "comp_opts".[show details]
>> 136stream = squashfs_decompressor_create(msblk, comp_opts);
>>
>> 3. Condition "IS_ERR(stream)", taking true branch
>> 137if (IS_ERR(stream))
>
>
> FALSE positive.
>
> squashfs_decompressor_create() frees comp_opts only on success.
>
> If IS_ERR(stream) is true, then comp_opts has not been freed by
> squashfs_decompressor_create().
>
> Phillip
>
>
>
>>
>> CID 1130962 (#1 of 1): Double free (USE_AFTER_FREE)4. double_free:
>> Calling "kfree(void const *)" frees pointer "comp_opts" which has
>> already been freed.
>> 138kfree(comp_opts);
>> 139
>> 140return stream;
>> 141}
>>
>>
>

Philip, set as false positive in Coverity. Thanks.

-- 
Regards,

Geyslan G. Bem
hackingbits.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8] ARM: OMAP2+: timer: Introduce OF-friendly clocksource/clockevent system timers

2013-11-22 Thread Joel Fernandes
Adding Thomas to the thread since discussion is about clocksource, and Mark
Rutland as discussion is related to timers and DT, thanks.

On 11/22/2013 02:01 PM, Rob Herring wrote:
> On Fri, Nov 22, 2013 at 10:42 AM, Joel Fernandes  wrote:
>> On 11/22/2013 09:58 AM, Rob Herring wrote:
>>> On Thu, Nov 21, 2013 at 7:56 PM, Joel Fernandes  wrote:
 This work is a migration effort of OMAP system timers to the
 clocksource/clockevent framework. Consider this as a first-pass in this 
 effort.
 There are few cleanups that need to be done first. The HWMOD code is
 intertwined with the timer code. HWMOD code cleanups in the future will
 hopefully make most of this code go away, so till then we separate out the
 power/clocks portion of the code from the actual timer bits.  This will
 facilitate near-future work of adapting the system timer as a clocksource.

 New functions for OF-only boot are introduced, and we can soon delete the 
 old
 versions once we migrate all platforms. Currently only AM335x is migrated 
 and
 testedA new omap_generic_timer_init function is introduced for DT 
 platforms.
 Code required earlier for non-DT platforms such as setup of timer IDs and 
 timer
 parent clock is not required.  parent clocks are automatically setup by 
 the mux
 clock driver through DT so they no longer need to be hardcoded.

 The init code will try to pick the best timer for clocksource and 
 clockevent
 however bindings are added to force a particular timer as clocksource or
 clockevent through DT.

 Signed-off-by: Joel Fernandes 
 ---
  .../devicetree/bindings/arm/omap/timer.txt |  12 ++
  arch/arm/mach-omap2/common.h   |   1 +
  arch/arm/mach-omap2/timer.c| 235 
 +
  3 files changed, 248 insertions(+)

 diff --git a/Documentation/devicetree/bindings/arm/omap/timer.txt 
 b/Documentation/devicetree/bindings/arm/omap/timer.txt
 index d02e27c..6cf7a75 100644
 --- a/Documentation/devicetree/bindings/arm/omap/timer.txt
 +++ b/Documentation/devicetree/bindings/arm/omap/timer.txt
 @@ -32,6 +32,18 @@ Optional properties:
  - ti,timer-secure: Indicates the timer is reserved on a secure OMAP 
 device
 and therefore cannot be used by the kernel.

 +- ti,timer-clockevent,
 +  ti,timer-clocksource These properties force the system timer code to 
 choose
 +   the particular timer as a clockevent or 
 clocksource.
 +   If these properties are not specified, the timer 
 code
 +   picks up a "ti,timer-alwon" as the clocksource and 
 a
 +   timer containing one of the following properties as
 +   the clockevent in the following order:
 +   ti,timer-alwon
 +   ti,timer-dsp
 +   ti,timer-pwm
 +   ti,timer-secure
>>>
>>> These properties were added specifically for the reason of avoiding
>>> linux specific properties like these. When is this not sufficient?
>>
>> Some platforms cannot use certain timers as clockevents and clocksource, to 
>> keep
>> this code functionally equivalent and working, I added these properties so 
>> that
>> its possible to select specific timers as clockevents/sources as was being 
>> done
>> in the non-DT case. I'm open to suggestions for doing this in a better way 
>> for DT.
> 
> There has to be a defined reason why a given timer cannot be used. You
> are not explaining what that reason is. Define a property or set of
> properties that describe the h/w feature or quirk.

Reason? There are HW bugs that prevent a timer from being used. See [1]. And
there may be other reasons, I haven't written this code but I know that there
are other HW bugs that made authors pick a specific timer such as board issues
or accuracy.

This is nothing new, just that I'm trying to find a way to do it from DT.

You can see the ifdef'ry below in mach-omap2/timer.c. All I'm trying to do is
make it simpler and cleaner by adding these properties to DT so that we can
delete this code entirely.


#ifdef CONFIG_ARCH_OMAP2
OMAP_SYS_32K_TIMER_INIT(2, 1, "timer_32k_ck", "ti,timer-alwon",
2, "timer_sys_ck", NULL);
#endif /* CONFIG_ARCH_OMAP2 */

#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM43XX)
OMAP_SYS_32K_TIMER_INIT(3, 1, "timer_32k_ck", "ti,timer-alwon",
2, "timer_sys_ck", NULL);
OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure",
2, "timer_sys_ck", NULL);
#endif /* CONFIG_ARCH_OMAP3 */

etc...

>>>
>>> And I agree with the comment to use OF_CLKSRC.
>>>
>>
>> There are difficulties I mentioned in a 

Re: Linux 3.13-rc1 is out

2013-11-22 Thread James Cloos
This combination:

# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
CONFIG_TRUSTED_KEYS=m

(acquired by oldconfig and N to system keyring)

fails with:

Pass 2
  CC [M]  crypto/asymmetric_keys/x509_rsakey-asn1.o
  CC [M]  crypto/asymmetric_keys/x509_cert_parser.o
  CC [M]  crypto/asymmetric_keys/x509_public_key.o
crypto/asymmetric_keys/x509_public_key.c: In function ‘x509_key_preparse’:
crypto/asymmetric_keys/x509_public_key.c:237:35: error: 
‘system_trusted_keyring’ undeclared (first use in this function)
   ret = x509_validate_trust(cert, system_trusted_keyring);
   ^
crypto/asymmetric_keys/x509_public_key.c:237:35: note: each undeclared 
identifier is reported only once for each function it appears in
make[2]: *** [crypto/asymmetric_keys/x509_public_key.o] Error 1
make[1]: *** [crypto/asymmetric_keys] Error 2
make: *** [crypto] Error 2

Perhaps include/keys/system_keyring.h should have a definition for
system_trusted_keyring in the #ifndef CONFIG_SYSTEM_TRUSTED_KEYRING
case (which may entail just removing the #ifdef).

Commits b56e5a17b6b9acd1 and 09fbc47373826d67 are relevant.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Tony Lindgren
* Russell King - ARM Linux  [131122 16:56]:
> On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote:
> > +   /* See of_device_resource_notify for populating interrupts */
> > +   for (i = 0; i < num_irq; i++, res++) {
> > +   res->flags = IORESOURCE_IRQ;
> > +   res->start = -EPROBE_DEFER;
> > +   res->end = -EPROBE_DEFER;
> 
> NAK.  Definitely a bad idea to start introducing magic values other into
> resources.  Please don't do this.

Do you have any better ideas on how to sort out this issue then?

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.13-rc1 is out

2013-11-22 Thread Linus Torvalds
On Fri, Nov 22, 2013 at 4:43 PM, Matthew Garrett  wrote:
>
> I'd sent a pull yesterday (http://lkml.org/lkml/2013/11/21/599 ) which
> may have missed your cutoff, but it seems worth checking.

No, it didn't miss my cutoff, and it wasn't even eaten by the gmail
spam filters, but it *had* missed my normal pattern for checking that
I had merged all git pull requests, which is why I had overlooked it.

Pulled now (going through the allmodconfig build test before being
pushed out shortly).

Btw, the reason it missed my normal check is that I search for
"in:inbox git pull" as a sanity check that I don't have anything
pending. Now, *most* of the time I tend to catch things regardless
just by virtue of reading email carefully, but when I'm busy it
definitely helps if your email matches that pattern.

The easiest (and most common) way to do that is to have the subject
line be something like

[GIT PULL] sound fixes #2 for 3.13-rc1

but there are other patterns. For example, David Miller tends to have
instead a subject line like

[GIT] Networking

and then "Please pull" in the body of the email. So the "in:inbox git
pull" thing basically catches pretty much everybody.

Except for you...

You use David Miller's subject line, but you don't have the "please
pull" part, so your email missed my "did I pull everything that I had
pending" check.

For very similar reasons, for people sending me a patch, it really
*really* helps if there's a "[patch]" (possibly with a n/m number
pattern) in the subject line..  Or make sure you send me a git diff,
because I also tend to search for the string "diff --git". Of course,
the main person who sends me patches is Andrew Morton, so I mostly
just search for "from:akpm" :)

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Tony Lindgren
* Tony Lindgren  [131122 16:44]:
> @@ -168,7 +218,13 @@ struct platform_device *of_device_alloc(struct 
> device_node *np,
>   rc = of_address_to_resource(np, i, res);
>   WARN_ON(rc);
>   }
> - WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq);
> +
> + /* See of_device_resource_notify for populating interrupts */
> + for (i = 0; i < num_irq; i++, res++) {
> + res->flags = IORESOURCE_IRQ;
> + res->start = -EPROBE_DEFER;
> + res->end = -EPROBE_DEFER;
> + }

Hmm actually we want to use some other value here as res->start
is not an int. Maybe some kind of hwirq_max + EPROBE_DEFER.

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.13-rc1 is out

2013-11-22 Thread Shuah Khan

On 11/22/2013 01:36 PM, Linus Torvalds wrote:

So you had an extra week to prepare your pull requests, and if you
were planning on sending it in the last two days thinking I'd close
the merge window on Sunday as usual, I can only laugh derisively in
your general direction, and call you bad names. Because I'm not
interested in your excuses. I did warn people about this in the 3.12
release notes. As it was, there were a few people who cut it fairly
close today. You know who you are.

If there are pull requests I missed (due to getting caught in spam
filters, or not matching my normal search patterns), and you think you
sent your pull request in time but it got overlooked, ping me -
because I don't have anything pending I know about, but mistakes
happen.

Talking about mistakes... I suspect it was a mistake to have that
extra week before the merge window opened, and I probably should just
have done a 3.12-rc8 instead. Because the linux-next statistics look
suspicious, and we had extra stuff show up there not just in that
first week. Clearly people took that "let's have an extra week of
merge window" and extrapolated it a bit too much. Oh, well. Live and
learn.

Anyway, other than that small oddity, this was a fairly normal merge
window. By patch size we had a pretty usual ~55% drivers, 18%
architecture code, 9% network updates, and the rest is spread out (fs,
headers, tools, documentation). Featurewise, the big ones are likely
the nftables and the multi-queue block layer stuff, but depending on
your interests you might find all the incremental updates to various
areas interesting. There are some odd ones in there (LE mode Powerpc
support..)

Go forth and test, and start sending me regression fixes. And really,
if you didn't send me your pull request in time, don't whine about it.
Because nobody likes a whiner.



3.13-rc1 fails to boot on Samsung Series 9. 3.12.1 is fine and the last 
mainline kernel that worked on it was from Nov 14th. It failed to mount 
/boot and after doing a manually mounting it, it came up and didn't 
detect USB mouse. Looking at the dmesg differences in dmesg, looks like 
usb probe fails on USB mouse. I will start a bisect and let you know 
what I find.


dmesg 3.13-rc1:

   63.637083] PM: Adding info for usb:2-1:1.0
[   63.637128] driver: '2-1': driver_bound: bound to device 'usb'
[   63.637136] bus: 'usb': really_probe: bound device 2-1 to driver usb
[  123.660457] usb 2-1: USB disconnect, device number 3
[  123.660505] bus: 'usb': remove device 2-1:1.0
[  123.660508] PM: Removing info for usb:2-1:1.0
[  123.661718] bus: 'usb': remove device 2-1
[  123.661724] PM: Removing info for usb:2-1
[  125.143833] usb 2-1: new low-speed USB device number 4 using xhci_hcd
[  125.190924] usb 2-1: New USB device found, idVendor=17ef, idProduct=6019
[  125.190928] usb 2-1: New USB device strings: Mfr=0, Product=2, 
SerialNumber=0

[  125.190930] usb 2-1: Product: Lenovo Optical USB Mouse
[  125.191042] bus: 'usb': add device 2-1
[  125.191054] PM: Adding info for usb:2-1
[  125.191081] bus: 'usb': driver_probe_device: matched device 2-1 with 
driver usb

[  125.191084] bus: 'usb': really_probe: probing driver usb with device 2-1
[  125.191097] usb 2-1: ep 0x81 - rounding interval to 64 microframes, 
ep desc says 80 microframes

[  125.201952] bus: 'usb': add device 2-1:1.0
[  125.201959] PM: Adding info for usb:2-1:1.0
[  125.202010] driver: '2-1': driver_bound: bound to device 'usb'
[  125.202019] bus: 'usb': really_probe: bound device 2-1 to driver usb

dmesg 3.12.1:


[5.562700] bus: 'usb': add driver usbhid
[5.562790] bus: 'usb': driver_probe_device: matched device 2-1:1.0 
with driv

er usbhid
[5.562794] bus: 'usb': really_probe: probing driver usbhid with 
device 2-1:1

.0
[5.575965] driver: '2-1:1.0': driver_bound: bound to device 'usbhid'
[5.575971] bus: 'usb': really_probe: bound device 2-1:1.0 to driver 
usbhid

[5.575998] usbcore: registered new interface driver usbhid
[5.576000] usbhid: USB HID core driver
[5.601326] input: Lenovo Optical USB Mouse as 
/devices/pci:00/:00:1c.4/:03:00.0/usb2/2-1/2-1:1.0/input/input6
[5.601796] hid-generic 0003:17EF:6019.0001: input,hidraw0: USB HID 
v1.11 Mouse [Lenovo Optical USB Mouse] on usb-:03:00.0-1/input0

[5.670722] bus: 'usb': add driver btusb
[5.670736] bus: 'usb': driver_probe_device: matched device 1-1.5:1.0 
with driver btusb
[5.670739] bus: 'usb': really_probe: probing driver btusb with 
device 1-1.5:1.0


-- Shuah

--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah...@samsung.com | (970) 672-0658
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] futex: Check for pi futex_q only once

2013-11-22 Thread Davidlohr Bueso
All wake_futex() callers already verify that the we are not dealing with
a pi futex_q, so we can remove the redundant WARN() check, as this is never
triggered anyway.

Cc: Ingo Molnar 
Cc: Darren Hart 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Mike Galbraith 
Cc: Jeff Mahoney 
Cc: Linus Torvalds 
Cc: Scott Norton 
Cc: Tom Vaden 
Cc: Aswin Chandramouleeswaran 
Cc: Waiman Long 
Cc: Jason Low 
Signed-off-by: Davidlohr Bueso 
---
 kernel/futex.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e6ffe73..0768c68 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -844,9 +844,6 @@ static void wake_futex(struct futex_q *q)
 {
struct task_struct *p = q->task;
 
-   if (WARN(q->pi_state || q->rt_waiter, "refusing to wake PI futex\n"))
-   return;
-
/*
 * We set q->lock_ptr = NULL _before_ we wake up the task. If
 * a non-futex wake up happens on another CPU then the task
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] futex: Misc cleanups

2013-11-22 Thread Davidlohr Bueso
From: Jason Low 

- Remove unnecessary head variables.
- Delete unused parameter in queue_unlock().

Cc: Ingo Molnar 
Cc: Darren Hart 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Mike Galbraith 
Cc: Jeff Mahoney 
Cc: Linus Torvalds 
Cc: Scott Norton 
Cc: Tom Vaden 
Cc: Aswin Chandramouleeswaran 
Cc: Waiman Long 
Signed-off-by: Jason Low 
Signed-off-by: Davidlohr Bueso 
---
 kernel/futex.c | 39 ---
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 80ba086..e6ffe73 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -597,13 +597,10 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 {
struct futex_pi_state *pi_state = NULL;
struct futex_q *this, *next;
-   struct plist_head *head;
struct task_struct *p;
pid_t pid = uval & FUTEX_TID_MASK;
 
-   head = >chain;
-
-   plist_for_each_entry_safe(this, next, head, list) {
+   plist_for_each_entry_safe(this, next, >chain, list) {
if (match_futex(>key, key)) {
/*
 * Another waiter already exists - bump up
@@ -985,7 +982,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
nr_wake, u32 bitset)
 {
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
-   struct plist_head *head;
union futex_key key = FUTEX_KEY_INIT;
int ret;
 
@@ -998,9 +994,8 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
nr_wake, u32 bitset)
 
hb = hash_futex();
spin_lock(>lock);
-   head = >chain;
 
-   plist_for_each_entry_safe(this, next, head, list) {
+   plist_for_each_entry_safe(this, next, >chain, list) {
if (match_futex (>key, )) {
if (this->pi_state || this->rt_waiter) {
ret = -EINVAL;
@@ -1033,7 +1028,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 
__user *uaddr2,
 {
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2;
-   struct plist_head *head;
struct futex_q *this, *next;
int ret, op_ret;
 
@@ -1081,9 +1075,7 @@ retry_private:
goto retry;
}
 
-   head = >chain;
-
-   plist_for_each_entry_safe(this, next, head, list) {
+   plist_for_each_entry_safe(this, next, >chain, list) {
if (match_futex (>key, )) {
if (this->pi_state || this->rt_waiter) {
ret = -EINVAL;
@@ -1096,10 +1088,8 @@ retry_private:
}
 
if (op_ret > 0) {
-   head = >chain;
-
op_ret = 0;
-   plist_for_each_entry_safe(this, next, head, list) {
+   plist_for_each_entry_safe(this, next, >chain, list) {
if (match_futex (>key, )) {
if (this->pi_state || this->rt_waiter) {
ret = -EINVAL;
@@ -1269,7 +1259,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int 
flags,
int drop_count = 0, task_count = 0, ret;
struct futex_pi_state *pi_state = NULL;
struct futex_hash_bucket *hb1, *hb2;
-   struct plist_head *head1;
struct futex_q *this, *next;
u32 curval2;
 
@@ -1392,8 +1381,7 @@ retry_private:
}
}
 
-   head1 = >chain;
-   plist_for_each_entry_safe(this, next, head1, list) {
+   plist_for_each_entry_safe(this, next, >chain, list) {
if (task_count - nr_wake >= nr_requeue)
break;
 
@@ -1493,7 +1481,7 @@ static inline struct futex_hash_bucket *queue_lock(struct 
futex_q *q)
 }
 
 static inline void
-queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
+queue_unlock(struct futex_hash_bucket *hb)
__releases(>lock)
 {
spin_unlock(>lock);
@@ -1866,7 +1854,7 @@ retry_private:
ret = get_futex_value_locked(, uaddr);
 
if (ret) {
-   queue_unlock(q, *hb);
+   queue_unlock(*hb);
 
ret = get_user(uval, uaddr);
if (ret)
@@ -1880,7 +1868,7 @@ retry_private:
}
 
if (uval != val) {
-   queue_unlock(q, *hb);
+   queue_unlock(*hb);
ret = -EWOULDBLOCK;
}
 
@@ -2028,7 +2016,7 @@ retry_private:
 * Task is exiting and we just wait for the
 * exit to complete.
 */
-   queue_unlock(, hb);
+   queue_unlock(hb);
put_futex_key();
cond_resched();
goto retry;
@@ -2080,7 +2068,7 @@ retry_private:
goto out_put_key;
 
 out_unlock_put_key:
-   queue_unlock(, hb);
+   queue_unlock(hb);
 
 out_put_key:
put_futex_key();
@@ -2090,7 +2078,7 @@ out:

[PATCH 3/5] futex: Larger hash table

2013-11-22 Thread Davidlohr Bueso
Currently, the futex global hash table suffers from it's fixed, smallish
(for today's standards) size of 256 entries, as well as its lack of NUMA
awareness. Large systems, using many futexes, can be prone to high amounts
of collisions; where these futexes hash to the same bucket and lead to
extra contention on the same hb->lock. Furthermore, cacheline bouncing is a
reality when we have multiple hb->locks residing on the same cacheline and
different futexes hash to adjacent buckets.

This patch keeps the current static size of 16 entries for small systems,
or otherwise, 256 * ncpus (or larger as we need to round the number to a
power of 2). Note that this number of CPUs accounts for all CPUs that can
ever be available in the system, taking into consideration things like
hotpluging. While we do impose extra overhead at bootup by making the hash
table larger, this is a one time thing, and does not shadow the benefits
of this patch.

Also, similar to other core kernel components (pid, dcache, tcp), by using
alloc_large_system_hash() we benefit from its NUMA awareness and thus the
table is distributed among the nodes instead of in a single one. We impose
this function's minimum limit of 256 entries, so that in worst case scenarios
or issues, we still end up using the current amount anyways.

For a custom microbenchmark that pounds on the uaddr hashing -- making the wait
path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of 
futexes,
we can see the following benefits on a 80-core, 8-socket 1Tb server:

+-+--+--+--+
| threads | baseline (ops/sec) [insns/cycle] | large hash table (ops/sec) 
[insns/cycle] | increase |
+-+--+--+--+
| 512 | 34429[0.07]  | 255274[0.48] 
| +641.45% |
| 256 | 65452[0.07]  | 443563[0.41] 
| +577.69% |
| 128 | 125111   [0.07]  | 742613[0.33] 
| +493.56% |
|  80 | 203642   [0.09]  | 1028147   [0.29] 
| +404.87% |
|  64 | 262944   [0.09]  | 997300[0.28] 
| +279.28% |
|  32 | 642390   [0.24]  | 965996[0.27] 
| +50.37   |
+-+--+--+--+

Cc: Ingo Molnar 
Cc: Darren Hart 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Mike Galbraith 
Cc: Jeff Mahoney 
Cc: Linus Torvalds 
Cc: Scott Norton 
Cc: Tom Vaden 
Cc: Aswin Chandramouleeswaran 
Signed-off-by: Waiman Long 
Signed-off-by: Jason Low 
Signed-off-by: Davidlohr Bueso 
---
 kernel/futex.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0768c68..5fa9eb0 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -70,7 +71,11 @@
 
 int __read_mostly futex_cmpxchg_enabled;
 
-#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
+#if CONFIG_BASE_SMALL
+static unsigned long futex_hashsize = 16;
+#else
+static unsigned long futex_hashsize;
+#endif
 
 /*
  * Futex flags used to encode options to functions and preserve them across
@@ -151,7 +156,11 @@ struct futex_hash_bucket {
struct plist_head chain;
 };
 
-static struct futex_hash_bucket futex_queues[1both.word)+sizeof(key->both.ptr))/4,
  key->both.offset);
-   return _queues[hash & ((1 << FUTEX_HASHBITS)-1)];
+   return _queues[hash & (futex_hashsize - 1)];
 }
 
 /*
@@ -2715,7 +2724,14 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, 
u32, val,
 static int __init futex_init(void)
 {
u32 curval;
-   int i;
+   unsigned long i;
+
+#if !CONFIG_BASE_SMALL
+   futex_hashsize = roundup_pow_of_two((256 * num_possible_cpus()));
+   futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
+  futex_hashsize, 0, 0, NULL, NULL,
+  256, futex_hashsize);
+#endif
 
/*
 * This will fail and we want it. Some arch implementations do
@@ -2730,7 +2746,7 @@ static int __init futex_init(void)
if (cmpxchg_futex_value_locked(, NULL, 0, 0) == -EFAULT)
futex_cmpxchg_enabled = 1;
 
-   for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
+   for (i = 0; i < futex_hashsize; i++) {
plist_head_init(_queues[i].chain);
spin_lock_init(_queues[i].lock);
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info 

[PATCH 5/5] sched,futex: Provide delayed wakeup list

2013-11-22 Thread Davidlohr Bueso
From: Peter Zijlstra 

Original patchset: https://lkml.org/lkml/2011/9/14/118

This is useful for locking primitives that can effect multiple
wakeups per operation and want to avoid lock internal lock contention
by delaying the wakeups until we've released the lock internal locks.

Alternatively it can be used to avoid issuing multiple wakeups, and
thus save a few cycles, in packet processing. Queue all target tasks
and wakeup once you've processed all packets. That way you avoid
waking the target task multiple times if there were multiple packets
for the same task.

This patch adds the needed infrastructure into the scheduler code
and uses the new wake_list to delay the futex wakeups until
after we've released the hash bucket locks. This avoids the newly
woken tasks from immediately getting stuck on the hb->lock.

Cc: Ingo Molnar 
Cc: Darren Hart 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Mike Galbraith 
Cc: Jeff Mahoney 
Cc: Linus Torvalds 
Cc: Scott Norton 
Cc: Tom Vaden 
Cc: Aswin Chandramouleeswaran 
Cc: Waiman Long 
Tested-by: Jason Low 
[forward ported]
Signed-off-by: Davidlohr Bueso 
---
Please note that in the original thread there was some debate
about spurious wakeups (https://lkml.org/lkml/2011/9/17/31), so
you can consider this more of an RFC patch if folks believe that
this functionality is incomplete/buggy.

 include/linux/sched.h | 41 +
 kernel/futex.c| 31 +++
 kernel/sched/core.c   | 19 +++
 3 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7e35d4b..679aabb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -793,6 +793,20 @@ enum cpu_idle_type {
 
 extern int __weak arch_sd_sibiling_asym_packing(void);
 
+struct wake_list_head {
+   struct wake_list_node *first;
+};
+
+struct wake_list_node {
+   struct wake_list_node *next;
+};
+
+#define WAKE_LIST_TAIL ((struct wake_list_node *) 0x01)
+
+#define WAKE_LIST(name)\
+   struct wake_list_head name = { WAKE_LIST_TAIL }
+
+
 struct sched_domain_attr {
int relax_domain_level;
 };
@@ -1078,6 +1092,8 @@ struct task_struct {
unsigned int btrace_seq;
 #endif
 
+   struct wake_list_node wake_list;
+
unsigned int policy;
int nr_cpus_allowed;
cpumask_t cpus_allowed;
@@ -2044,6 +2060,31 @@ extern void wake_up_new_task(struct task_struct *tsk);
 extern void sched_fork(unsigned long clone_flags, struct task_struct *p);
 extern void sched_dead(struct task_struct *p);
 
+static inline void
+wake_list_add(struct wake_list_head *head, struct task_struct *p)
+{
+   struct wake_list_node *n = >wake_list;
+
+   /*
+* Atomically grab the task, if ->wake_list is !0 already it means
+* its already queued (either by us or someone else) and will get the
+* wakeup due to that.
+*
+* This cmpxchg() implies a full barrier, which pairs with the write
+* barrier implied by the wakeup in wake_up_list().
+*/
+   if (cmpxchg(>next, NULL, head->first))
+   return;
+
+   /*
+* The head is context local, there can be no concurrency.
+*/
+   get_task_struct(p);
+   head->first = n;
+}
+
+extern void wake_up_list(struct wake_list_head *head, unsigned int state);
+
 extern void proc_caches_init(void);
 extern void flush_signals(struct task_struct *);
 extern void __flush_signals(struct task_struct *);
diff --git a/kernel/futex.c b/kernel/futex.c
index 2f9dd5d..3d60a3d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -851,20 +851,17 @@ static void __unqueue_futex(struct futex_q *q)
  * The hash bucket lock must be held when this is called.
  * Afterwards, the futex_q must not be accessed.
  */
-static void wake_futex(struct futex_q *q)
+static void wake_futex(struct wake_list_head *wake_list, struct futex_q *q)
 {
struct task_struct *p = q->task;
 
/*
-* We set q->lock_ptr = NULL _before_ we wake up the task. If
-* a non-futex wake up happens on another CPU then the task
-* might exit and p would dereference a non-existing task
-* struct. Prevent this by holding a reference on p across the
-* wake up.
+* Queue up and delay the futex wakeups until the hb lock
+* has been released.
 */
-   get_task_struct(p);
-
+   wake_list_add(wake_list, p);
__unqueue_futex(q);
+
/*
 * The waiting task can free the futex_q as soon as
 * q->lock_ptr = NULL is written, without taking any locks. A
@@ -873,9 +870,6 @@ static void wake_futex(struct futex_q *q)
 */
smp_wmb();
q->lock_ptr = NULL;
-
-   wake_up_state(p, TASK_NORMAL);
-   put_task_struct(p);
 }
 
 static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
@@ -992,6 +986,7 @@ futex_wake(u32 __user 

[PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup

2013-11-22 Thread Davidlohr Bueso
In futex_wake() there is clearly no point in taking the hb->lock if
we know beforehand that there are no tasks to be woken. This comes
at the smaller cost of doing some atomic operations to keep track of
the list's size. Specifically, increment the counter when an element is
added to the list, and decrement when it is removed. Of course, if the
counter is 0, then there are no tasks blocked on a futex. Some special
considerations:

- increment the counter at queue_lock() as we always end up calling
  queue_me() which adds the element to the list. Upon any error,
  queue_unlock() is called for housekeeping, for which we decrement
  to mach the increment done in queue_lock().

- decrement the counter at __unqueue_me() to reflect when an element is
  removed from the queue for wakeup related purposes.

Cc: Ingo Molnar 
Cc: Darren Hart 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Mike Galbraith 
Cc: Jeff Mahoney 
Cc: Linus Torvalds 
Cc: Scott Norton 
Cc: Tom Vaden 
Cc: Aswin Chandramouleeswaran 
Signed-off-by: Waiman Long 
Signed-off-by: Jason Low 
Signed-off-by: Davidlohr Bueso 
---
 kernel/futex.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 5fa9eb0..2f9dd5d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -152,6 +152,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+   atomic_t len;
spinlock_t lock;
struct plist_head chain;
 };
@@ -843,6 +844,7 @@ static void __unqueue_futex(struct futex_q *q)
 
hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
plist_del(>list, >chain);
+   atomic_dec(>len);
 }
 
 /*
@@ -999,6 +1001,10 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
nr_wake, u32 bitset)
goto out;
 
hb = hash_futex();
+   /* make sure we really have tasks to wakeup */
+   if (atomic_read(>len) == 0)
+   goto out_put_key;
+
spin_lock(>lock);
 
plist_for_each_entry_safe(this, next, >chain, list) {
@@ -1019,6 +1025,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int 
nr_wake, u32 bitset)
}
 
spin_unlock(>lock);
+out_put_key:
put_futex_key();
 out:
return ret;
@@ -1137,7 +1144,9 @@ void requeue_futex(struct futex_q *q, struct 
futex_hash_bucket *hb1,
 */
if (likely(>chain != >chain)) {
plist_del(>list, >chain);
+   atomic_dec(>len);
plist_add(>list, >chain);
+   atomic_inc(>len);
q->lock_ptr = >lock;
}
get_futex_key_refs(key2);
@@ -1480,6 +1489,8 @@ static inline struct futex_hash_bucket *queue_lock(struct 
futex_q *q)
struct futex_hash_bucket *hb;
 
hb = hash_futex(>key);
+   atomic_inc(>len);
+
q->lock_ptr = >lock;
 
spin_lock(>lock);
@@ -1491,6 +1502,7 @@ queue_unlock(struct futex_hash_bucket *hb)
__releases(>lock)
 {
spin_unlock(>lock);
+   atomic_dec(>len);
 }
 
 /**
@@ -,6 +2234,7 @@ int handle_early_requeue_pi_wakeup(struct 
futex_hash_bucket *hb,
 * Unqueue the futex_q and determine which it was.
 */
plist_del(>list, >chain);
+   atomic_dec(>len);
 
/* Handle spurious wakeups gracefully */
ret = -EWOULDBLOCK;
@@ -2749,6 +2762,7 @@ static int __init futex_init(void)
for (i = 0; i < futex_hashsize; i++) {
plist_head_init(_queues[i].chain);
spin_lock_init(_queues[i].lock);
+   atomic_set(_queues[i].len, 0);
}
 
return 0;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] futex: Wakeup optimizations

2013-11-22 Thread Davidlohr Bueso
We have been dealing with a customer database workload on large
12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
of contention on some of the locks that serialize internal futex 
data structures. This workload specially suffers in the wakeup 
paths, where waiting on the corresponding hb->lock can account for 
up to ~60% of the time. The result of such calls can mostly be 
classified as (i) nothing to wake up and (ii) wakeup large amount 
of tasks.

Before these patches are applied, we can see this pathological behavior:

 37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
--- _raw_spin_lock
 |
 |--97.14%-- futex_wake
 |  do_futex
 |  sys_futex
 |  system_call_fastpath
 |  |
 |  |--99.70%-- 0x7f383fbdea1f
 |  |   yyy

 43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
--- _raw_spin_lock
 |
 |--53.74%-- futex_wake
 |  do_futex
 |  sys_futex
 |  system_call_fastpath
 |  |
 |  |--99.40%-- 0x7fe7d44a4c05
 |  |   zzz
 |--45.90%-- futex_wait_setup
 |  futex_wait
 |  do_futex
 |  sys_futex
 |  system_call_fastpath
 |  0x7fe7ba315789
 |  syscall


With these patches, contention is practically non existent:

 0.10% 49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
   --- _raw_spin_lock
|
|--76.06%-- futex_wait_setup
|  futex_wait
|  do_futex
|  sys_futex
|  system_call_fastpath
|  |
|  |--99.90%-- 0x7f3165e63789
|  |  syscall|
   ...
|--6.27%-- futex_wake
|  do_futex
|  sys_futex
|  system_call_fastpath
|  |
|  |--54.56%-- 0x7f317fff2c05
...

Patches 1 & 2 are cleanups and micro optimizations.

Patch 3 addresses the well known issue of the global hash table.
By creating a larger and NUMA aware table, we can reduce the false
sharing and collisions, thus reducing the chance of different futexes 
using hb->lock.

Patch 4 reduces contention on the corresponding hb->lock by not trying to
acquire it if there are no blocked tasks in the waitqueue.
This particularly deals with point (i) above, where we see that it is not
uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
tasks were woken.

Patch 5 resurrects a two year old idea from Peter Zijlstra to delay
the waking of the blocked tasks to be done without holding the hb->lock:
https://lkml.org/lkml/2011/9/14/118

This is useful for locking primitives that can effect multiple wakeups
per operation and want to avoid the futex's internal spinlock contention by
delaying the wakeups until we've released the hb->lock.
This particularly deals with point (ii) above, where we can observe that
in occasions the wake calls end up waking 125 to 200 waiters in what we believe 
are RW locks in the application.

This patchset has also been tested on smaller systems for a variety of
benchmarks, including java workloads, kernel builds and custom 
bang-the-hell-out-of
hb locks programs. So far, no functional or performance regressions have been 
seen.
Furthermore, no issues were found when running the different tests in the 
futextest 
suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/

This patchset applies on top of Linus' tree as of v3.13-rc1.

Special thanks to Scott Norton, Tom Vanden and Mark Ray for help presenting, 
debugging and analyzing the data.

  futex: Misc cleanups
  futex: Check for pi futex_q only once
  futex: Larger hash table
  futex: Avoid taking hb lock if nothing to wakeup
  sched,futex: Provide delayed wakeup list

 include/linux/sched.h |  41 ++
 kernel/futex.c| 113 +++---
 kernel/sched/core.c   |  19 +
 3 files changed, 122 insertions(+), 51 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Russell King - ARM Linux
On Fri, Nov 22, 2013 at 04:43:35PM -0800, Tony Lindgren wrote:
> + /* See of_device_resource_notify for populating interrupts */
> + for (i = 0; i < num_irq; i++, res++) {
> + res->flags = IORESOURCE_IRQ;
> + res->start = -EPROBE_DEFER;
> + res->end = -EPROBE_DEFER;

NAK.  Definitely a bad idea to start introducing magic values other into
resources.  Please don't do this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] devicetree: doc: Document ti,timer-parent property

2013-11-22 Thread Tony Lindgren
* Joel Fernandes  [131122 16:32]:
> On 11/22/2013 11:08 AM, Tony Lindgren wrote:
> > 
> > I don't think there's a dependency here to the omap clocks as the dmtimer
> > can implement the clocksource separately and internally still use clk_get
> > using the clock alias table.
> 
> You mean implement clock-tree separately? Sorry I'm confused can you clarify
> what you mean?

You could implement the needed clocks for client drivers to use in dmtimer.c
directly if dmtimer.c is the gating those clocks.
 
> In clock tree data (not upstream), here is the system clock for am335x for 
> example:
> sys_clkin_ck: sys_clkin_ck@44e10040 {
> #clock-cells = <0>;
> compatible = "mux-clock";
> 
> It uses the mux-clock driver. Are you saying we duplicate clock-tree stuff? I
> don't think that's a good idea specially since once clock dt data is 
> available,
> we will switch to using it.

Oh OK, then that naturally could be used directly too.
 
>   Optional properties:
>   - ti,timer-alwon:   Indicates the timer is in an alway-on power 
>  domain.
> >>>
> >>> Hmm this we may not need, this can probably be deciphered from the 
> >>> compatible
> >>> flag already?
> >>
> >> How? Compatible contains the same string, for example for OMAP4:
> >>
> >> timer8 has:
> >> compatible = "ti,omap4430-timer";
> >> ti,timer-pwm;
> >> ti,timer-dsp;
> >>
> >> and timer9 has:
> >> compatible = "ti,omap4430-timer";
> >> ti,hwmods = "timer9";
> >> ti,timer-pwm;
> >>
> > 
> > Some of these features are always hardwired certain way and could be mapped 
> > to
> > the right timer based on the timer offset and the compatible flag if we want
> > to avoid adding the ti,timer-alwon property. It seems that most omaps have
> > just one always on timer that's the first timer, and only on am33xx it does
> > not exist?
> > 
> > I'm fine adding ti,timer-alwon if it help to leave out static data in the
> > driver and avoid patching the driver for every new SoC. But sounds like in
> > this case we really have just the am33xx exception to the rule?
> 
> ti,timer-alwon may not be needed yes, since on all platforms I've observed 
> first
> timer has this property. However from OMAP3 dt, timer12 has it too. Not sure
> what that implies.

I guess you could mark timer1 and 12 as always on if the compatible flag 
matches ti,omap3430-timer.
 
> I think what Jon was trying to do is to find a DT node by property, he had no
> other way of getting a device_node * otherwise.
> 
> But notice this macro used for HS (secure devices):
> OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure",
> 2, "timer_sys_ck", NULL);
> 
> How would you specify in DT that you want a node with the timer-secure 
> property
> as the clocksource if we drop these kind of properties?

timer {
interrupt-parent = <>;
...
}

Should do the trick I think :)
 
> >>> Then for the users of a specific dmtimer, they can select the right one 
> >>> using
> >>> the interrupt-parent property:
> >>>
> >>> timer1: timer@0x4800abcd {
> >>>   compatible = "ti,omap5430-timer";
> >>>   #interrupt-cells = <1>; /* needs irqchip implemented for 
> >>> dmtimer */
> >>>   interrupt-controller;
> >>>   #clock-cells = <1>; /* needs clocksource implemented for 
> >>> dmtimer */
> >>>   clock-output-names = "32k", "sys_ck";
> >>>   ...
> >>> };
> >>
> >> In reference to my last thread reply, irqchip may not be available early 
> >> in the
> >> boot process (.init_time) for system timer usage?
> > 
> > Hmm it should be, looks like we have in arch/arm/kernel/irq.c:
> > 
> > void __init init_IRQ(void)
> > {
> > if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq)
> > irqchip_init();
> > else
> > machine_desc->init_irq();
> > }   
> > 
> > Then in init/main.c has:
> > ...
> > early_irq_init();
> > init_IRQ();
> > tick_init();
> > init_timers();
> > hrtimers_init();
> > softirq_init();
> > timekeeping_init();
> > time_init();
> > ...
> > 
> > So looks like we should have irqchip available?
> 
> Right. I think your idea of using irqchip is certainly a clean way. Let me go
> back to the drawing board and try to see if we have all the pieces we need and
> there are no surprises in doing it this way.

OK cool.
 
> Then we can have a general purpose clocksource driver that uses the irqchip
> driver. I still worry about things like hwmod that may be need to be called on
> specific timer, and runtime PM is not available that early in the boot 
> process.

Yeah we may still need a piece of code in mach-omap2 for now if runtime PM is
not available at that point yet.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

mmotm 2013-11-22-16-50 uploaded

2013-11-22 Thread akpm
The mm-of-the-moment snapshot 2013-11-22-16-50 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (3.x
or 3.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/?p=linux-mmots.git;a=summary

and use of this tree is similar to
http://git.cmpxchg.org/?p=linux-mmotm.git, described above.


This mmotm tree contains the following patches against 3.13-rc1:
(patches marked "*" will be included in linux-next)

  arch-alpha-kernel-systblss-remove-debug-check.patch
  i-need-old-gcc.patch
* include-linux-hugetlbh-make-isolate_huge_page-an-inline.patch
* 
documentation-docbook-device-driverstmpl-fix-generation-of-device-drivers.patch
* mm-memcg-do-not-declare-oom-from-__gfp_nofail-allocations.patch
* drivers-rtc-rtc-at91rm9200c-correct-alarm-over-day-month-wrap.patch
* drivers-rtc-rtc-s5mc-fix-info-rtc-assignment.patch
* arch-x86-mnify-pte_to_pgoff-and-pgoff_to_pte-helpers.patch
* x86-mm-get-aslr-work-for-hugetlb-mappings.patch
* dma-debug-enhance-dma_debug_device_change-to-check-for-mapping-errors.patch
* drm-fb-helper-dont-sleep-for-screen-unblank-when-an-oopps-is-in-progress.patch
* drm-cirrus-correct-register-values-for-16bpp.patch
* drm-nouveau-make-vga_switcheroo-code-depend-on-vga_switcheroo.patch
* drivers-gpu-drm-drm_edid_loadc-make-edid_load-return-a-void.patch
* genirq-correct-fuzzy-and-fragile-irq_retval-definition.patch
* sched_clock-document-4mhz-vs-1mhz-decision.patch
* kernel-timerc-convert-kmalloc_nodegfp_zero-to-kzalloc_node.patch
* kernel-time-tick-commonc-document-tick_do_timer_cpu.patch
* input-route-kbd-leds-through-the-generic-leds-layer.patch
* drivers-net-irda-donauboe-convert-to-module_pci_driver.patch
* ocfs2-free-allocated-clusters-if-error-occurs-after-ocfs2_claim_clusters.patch
* 
ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one.patch
* 
ocfs2-should-call-ocfs2_journal_access_di-before-ocfs2_delete_entry-in-ocfs2_orphan_del.patch
* ocfs2-llseek-requires-ocfs2-inode-lock-for-the-file-in-seek_end.patch
* ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size.patch
* ocfs2-update-inode-size-after-zeronig-the-hole.patch
* scsi-do-not-call-do_div-with-a-64-bit-divisor.patch
* drivers-scsi-megaraid-megaraid_mmc-missing-bounds-check-in-mimd_to_kioc.patch
* drivers-cdrom-gdromc-remove-deprecated-irqf_disabled.patch
* drivers-block-sx8c-use-module_pci_driver.patch
* drivers-block-sx8c-remove-unnecessary-pci_set_drvdata.patch
* drivers-block-paride-pgc-underflow-bug-in-pg_write.patch
* hpsa-return-0-from-driver-probe-function-on-success-not-1.patch
* drivers-block-ccissc-cciss_init_one-use-proper-errnos.patch
* blk-mq-use-__smp_call_function_single-directly.patch
* block-blk-mq-cpuc-use-hotcpu_notifier.patch
* block-remove-unrelated-header-files-and-export-symbol.patch
* mtd-cmdlinepart-use-cmdline-partition-parser-lib.patch
* mtd-cmdlinepart-use-cmdline-partition-parser-lib-fix.patch
* 

Re: Linux 3.13-rc1 is out

2013-11-22 Thread Matthew Garrett
Hi Linus,

I'd sent a pull yesterday (http://lkml.org/lkml/2013/11/21/599 ) which 
may have missed your cutoff, but it seems worth checking.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of/platform: Fix no irq domain found errors when populating interrupts

2013-11-22 Thread Tony Lindgren
Currently we get the following kind of errors if we try to use
interrupt phandles to irqchips that have not yet initialized:

irq: no irq domain found for /ocp/pinmux@48002030 !
WARNING: CPU: 0 PID: 1 at drivers/of/platform.c:171 
of_device_alloc+0x144/0x184()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00038-g42a9708 #1012
(show_stack+0x14/0x1c)
(dump_stack+0x6c/0xa0)
(warn_slowpath_common+0x64/0x84)
(warn_slowpath_null+0x1c/0x24)
(of_device_alloc+0x144/0x184)
(of_platform_device_create_pdata+0x44/0x9c)
(of_platform_bus_create+0xd0/0x170)
(of_platform_bus_create+0x12c/0x170)
(of_platform_populate+0x60/0x98)
...

This is because we're wrongly trying to populate resources that are not
yet available. It's perfectly valid to create irqchips dynamically,
so let's fix up the issue by populating the interrupt resources based
on a notifier call instead.

Signed-off-by: Tony Lindgren 

---

Rob & Grant, care to merge this for the -rc if this looks OK to you?

These happen for example when using interrupts-extended for omap
wake-up interrupts where the irq domain is created by pinctrl-single.c
at module_init time.

--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -130,6 +130,56 @@ void of_device_make_bus_id(struct device *dev)
dev_set_name(dev, "%s.%d", node->name, magic - 1);
 }
 
+/*
+ * The device interrupts are not necessarily available for all
+ * irqdomains initially so we need to populate them using a
+ * notifier.
+ */
+static int of_device_resource_notify(struct notifier_block *nb,
+unsigned long event, void *dev)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   struct device_node *np = pdev->dev.of_node;
+   struct resource *res = pdev->resource;
+   struct resource *irqr = NULL;
+   int num_irq, i, found = 0;
+
+   if (event != BUS_NOTIFY_BIND_DRIVER)
+   return 0;
+
+   if (!np)
+   goto out;
+
+   num_irq = of_irq_count(np);
+   if (!num_irq)
+   goto out;
+
+   for (i = 0; i < pdev->num_resources; i++, res++) {
+   if (res->flags != IORESOURCE_IRQ ||
+   res->start != -EPROBE_DEFER ||
+   res->end != -EPROBE_DEFER)
+   continue;
+
+   if (!irqr)
+   irqr = res;
+   found++;
+   }
+
+   if (!found)
+   goto out;
+
+   if (found != num_irq) {
+   dev_WARN(dev, "error populating irq resources: %i != %i\n",
+found, num_irq);
+   goto out;
+   }
+
+   WARN_ON(of_irq_to_resource_table(np, irqr, num_irq) != num_irq);
+
+out:
+   return NOTIFY_DONE;
+}
+
 /**
  * of_device_alloc - Allocate and initialize an of_device
  * @np: device node to assign to device
@@ -168,7 +218,13 @@ struct platform_device *of_device_alloc(struct device_node 
*np,
rc = of_address_to_resource(np, i, res);
WARN_ON(rc);
}
-   WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq);
+
+   /* See of_device_resource_notify for populating interrupts */
+   for (i = 0; i < num_irq; i++, res++) {
+   res->flags = IORESOURCE_IRQ;
+   res->start = -EPROBE_DEFER;
+   res->end = -EPROBE_DEFER;
+   }
}
 
dev->dev.of_node = of_node_get(np);
@@ -447,6 +503,8 @@ int of_platform_bus_probe(struct device_node *root,
 }
 EXPORT_SYMBOL(of_platform_bus_probe);
 
+static struct notifier_block resource_nb;
+
 /**
  * of_platform_populate() - Populate platform_devices from device tree data
  * @root: parent of the first level to probe or NULL for the root of the tree
@@ -478,6 +536,11 @@ int of_platform_populate(struct device_node *root,
if (!root)
return -EINVAL;
 
+   if (!resource_nb.notifier_call) {
+   resource_nb.notifier_call = of_device_resource_notify,
+   bus_register_notifier(_bus_type, _nb);
+   }
+
for_each_child_of_node(root, child) {
rc = of_platform_bus_create(child, matches, lookup, parent, 
true);
if (rc)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] devicetree: doc: Document ti,timer-parent property

2013-11-22 Thread Joel Fernandes
On 11/22/2013 11:08 AM, Tony Lindgren wrote:
> * Joel Fernandes  [131122 08:37]:
>> Hi Tony,
>>
>> Thanks for your comments, few replies inline below..
>>
>> On 11/22/2013 09:58 AM, Tony Lindgren wrote:
>>> * Joel Fernandes  [131121 18:00]:
 +Required properties for system timers (clockevents/clocksource):
 +- ti,timer-parent:System timer's parent mux clock needs to be 
 setup.
 +  This is currently hardcoded in code, for DT boot we
 +  move this to DT.
 +
>>>
>>> This can be replaced with just clk_set_rate, or clk_set_parent if needed.
>>> Or by having a clocks = <&32k_clk> property in the dmtimer node in the
>>> .dts file.
>>>
>>
>> Sure, but clock-data is still not available to make this possible in 
>> mainline.
>> We also discussed earlier right that we don't want dependencies as much as
>> possible to get one chunk in and working at a time. I was thinking like for a
>> first-pass since there's a lot of unrelated code that doesn't have 
>> dependencies,
>> but needs this to work, we can introduce this property for now and drop it 
>> later
>> as a "cost of migration"?
> 
> I don't think there's a dependency here to the omap clocks as the dmtimer
> can implement the clocksource separately and internally still use clk_get
> using the clock alias table.

You mean implement clock-tree separately? Sorry I'm confused can you clarify
what you mean?

In clock tree data (not upstream), here is the system clock for am335x for 
example:
sys_clkin_ck: sys_clkin_ck@44e10040 {
#clock-cells = <0>;
compatible = "mux-clock";

It uses the mux-clock driver. Are you saying we duplicate clock-tree stuff? I
don't think that's a good idea specially since once clock dt data is available,
we will switch to using it.

  Optional properties:
  - ti,timer-alwon: Indicates the timer is in an alway-on power domain.
>>>
>>> Hmm this we may not need, this can probably be deciphered from the 
>>> compatible
>>> flag already?
>>
>> How? Compatible contains the same string, for example for OMAP4:
>>
>> timer8 has:
>> compatible = "ti,omap4430-timer";
>> ti,timer-pwm;
>> ti,timer-dsp;
>>
>> and timer9 has:
>> compatible = "ti,omap4430-timer";
>> ti,hwmods = "timer9";
>> ti,timer-pwm;
>>
> 
> Some of these features are always hardwired certain way and could be mapped to
> the right timer based on the timer offset and the compatible flag if we want
> to avoid adding the ti,timer-alwon property. It seems that most omaps have
> just one always on timer that's the first timer, and only on am33xx it does
> not exist?
> 
> I'm fine adding ti,timer-alwon if it help to leave out static data in the
> driver and avoid patching the driver for every new SoC. But sounds like in
> this case we really have just the am33xx exception to the rule?

ti,timer-alwon may not be needed yes, since on all platforms I've observed first
timer has this property. However from OMAP3 dt, timer12 has it too. Not sure
what that implies.

I think what Jon was trying to do is to find a DT node by property, he had no
other way of getting a device_node * otherwise.

But notice this macro used for HS (secure devices):
OMAP_SYS_32K_TIMER_INIT(3_secure, 12, "secure_32k_fck", "ti,timer-secure",
2, "timer_sys_ck", NULL);

How would you specify in DT that you want a node with the timer-secure property
as the clocksource if we drop these kind of properties?

>>> Then for the users of a specific dmtimer, they can select the right one 
>>> using
>>> the interrupt-parent property:
>>>
>>> timer1: timer@0x4800abcd {
>>> compatible = "ti,omap5430-timer";
>>> #interrupt-cells = <1>; /* needs irqchip implemented for 
>>> dmtimer */
>>> interrupt-controller;
>>> #clock-cells = <1>; /* needs clocksource implemented for 
>>> dmtimer */
>>> clock-output-names = "32k", "sys_ck";
>>> ...
>>> };
>>
>> In reference to my last thread reply, irqchip may not be available early in 
>> the
>> boot process (.init_time) for system timer usage?
> 
> Hmm it should be, looks like we have in arch/arm/kernel/irq.c:
> 
> void __init init_IRQ(void)
> {
>   if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq)
>   irqchip_init();
>   else
>   machine_desc->init_irq();
> }   
> 
> Then in init/main.c has:
>   ...
>   early_irq_init();
>   init_IRQ();
>   tick_init();
>   init_timers();
>   hrtimers_init();
>   softirq_init();
>   timekeeping_init();
>   time_init();
>   ...
> 
> So looks like we should have irqchip available?

Right. I think your idea of using irqchip is certainly a clean way. Let me go
back to the drawing board and try to see if we have all the pieces we need and
there are no surprises in doing it this way.


Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

2013-11-22 Thread Mel Gorman
On Fri, Nov 22, 2013 at 11:05:24PM +, Mel Gorman wrote:
> On Fri, Nov 22, 2013 at 03:28:07PM -0600, Alex Thorlton wrote:
> > > If the warning added by that patch does *not* trigger than can you also
> > > test this patch? It removes the barriers which should not be necessary
> > > and takes a reference tot he page before waiting on the lock. The
> > > previous version did not take the reference because otherwise the
> > > WARN_ON could not distinguish between a migration waiter and a surprise
> > > gup.
> > 
> > Sorry for the delay; been a bit busy.  I tested both of these patches on
> > top of this one (separately, of course):
> > 
> > http://www.spinics.net/lists/linux-mm/msg63919.html
> > 
> > I think that's the one you were referring to, if not send me a pointer
> > to the correct one and I'll give it another shot.  Both patches still
> > segfaulted, so it doesn't appear that either of these solved the
> > problem. 
> 
> I see. Does THP have to be enabled or does it segfault even with THP
> disabled?
> 

On a semi-related note, is the large machine doing anything with xpmem
or anything that depends on MMU notifiers to work properly? I noted
while looking at this that THP migration is not invalidating pages which
might be confusing a driver depending on it.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: recursive locking (coredump/vfs_write)

2013-11-22 Thread Al Viro
On Fri, Nov 22, 2013 at 10:11:56PM +0100, Jan Kara wrote:
>   Hi,
> 
> On Wed 13-11-13 16:11:47, Dave Jones wrote:
> > here's another one..
> > 
> > 
> > =
> > [ INFO: possible recursive locking detected ]
> > 3.12.0+ #2 Not tainted
> > -
> > trinity-child3/13302 is trying to acquire lock:
> >  (sb_writers#5){.+.+.+}, at: [] vfs_write+0x173/0x1f0
> > 
> > but task is already holding lock:
> >  (sb_writers#5){.+.+.+}, at: [] do_coredump+0xf1d/0x1070
>   Thanks for report. Attached patch should fix this. Al, can you please
> merge it?

No.  It's already fixed in mainline by commit 52da40.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] net: MOXA ART: connect to PHY and add ethtool support

2013-11-22 Thread Ben Hutchings
On Fri, 2013-11-22 at 15:57 +0100, Jonas Jensen wrote:
> The kernel now has a MDIO bus driver and a phy_driver (RTL8201CP),
> connect to this PHY using OF and add ethtool support.
> 
> Signed-off-by: Jonas Jensen 
> ---
> 
> Notes:
> Applies to next-20131122
> 
>  drivers/net/ethernet/moxa/moxart_ether.c | 179 
> ++-
>  drivers/net/ethernet/moxa/moxart_ether.h |   1 +
>  2 files changed, 179 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c 
> b/drivers/net/ethernet/moxa/moxart_ether.c
> index 3c14afd..bcc6005 100644
> --- a/drivers/net/ethernet/moxa/moxart_ether.c
> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
> @@ -26,9 +26,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "moxart_ether.h"
>  
> +#define DRV_NAME"moxart-ethernet"
> +#define DRV_VERSION "0.2"
> +#define MOXART_NUM_STATS16

MOXART_NUM_STATS should be defined as ARRAY_SIZE(ethtool_stats_keys).

[...]
> +static void moxart_get_drvinfo(struct net_device *ndev,
> +struct ethtool_drvinfo *info)
> +{
> + strlcpy(info->driver, DRV_NAME, sizeof(info->driver));
> + strlcpy(info->version, DRV_VERSION, sizeof(info->version));
> + strlcpy(info->bus_info, dev_name(>dev), sizeof(info->bus_info));
> + info->n_stats = MOXART_NUM_STATS;
[...]

Don't initialise n_stats here; the core will initialise it using your
get_sset_count implementation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: add strictlimit knob -v2

2013-11-22 Thread Andrew Morton
On Wed, 06 Nov 2013 19:05:57 +0400 Maxim Patlasov  
wrote:

> "strictlimit" feature was introduced to enforce per-bdi dirty limits for
> FUSE which sets bdi max_ratio to 1% by default:
> 
> http://article.gmane.org/gmane.linux.kernel.mm/105809
> 
> However the feature can be useful for other relatively slow or untrusted
> BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable the
> feature:
> 
> echo 1 > /sys/class/bdi/X:Y/strictlimit
> 
> Being enabled, the feature enforces bdi max_ratio limit even if global (10%)
> dirty limit is not reached. Of course, the effect is not visible until
> /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
> 
> ...
>
> --- a/Documentation/ABI/testing/sysfs-class-bdi
> +++ b/Documentation/ABI/testing/sysfs-class-bdi
> @@ -53,3 +53,11 @@ stable_pages_required (read-only)
>  
>   If set, the backing device requires that all pages comprising a write
>   request must not be changed until writeout is complete.
> +
> +strictlimit (read-write)
> +
> + Forces per-BDI checks for the share of given device in the write-back
> + cache even before the global background dirty limit is reached. This
> + is useful in situations where the global limit is much higher than
> + affordable for given relatively slow (or untrusted) device. Turning
> + strictlimit on has no visible effect if max_ratio is equal to 100%.
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index ce682f7..4ee1d64 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -234,11 +234,46 @@ static ssize_t stable_pages_required_show(struct device 
> *dev,
>  }
>  static DEVICE_ATTR_RO(stable_pages_required);
>  
> +static ssize_t strictlimit_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct backing_dev_info *bdi = dev_get_drvdata(dev);
> + unsigned int val;
> + ssize_t ret;
> +
> + ret = kstrtouint(buf, 10, );
> + if (ret < 0)
> + return ret;
> +
> + switch (val) {
> + case 0:
> + bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
> + break;
> + case 1:
> + bdi->capabilities |= BDI_CAP_STRICTLIMIT;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return count;
> +}
> +static ssize_t strictlimit_show(struct device *dev,
> + struct device_attribute *attr, char *page)
> +{
> + struct backing_dev_info *bdi = dev_get_drvdata(dev);
> +
> + return snprintf(page, PAGE_SIZE-1, "%d\n",
> + !!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
> +}
> +static DEVICE_ATTR_RW(strictlimit);
> +
>  static struct attribute *bdi_dev_attrs[] = {
>   _attr_read_ahead_kb.attr,
>   _attr_min_ratio.attr,
>   _attr_max_ratio.attr,
>   _attr_stable_pages_required.attr,
> + _attr_strictlimit.attr,
>   NULL,

Well the patch is certainly simple and straightforward enough and
*seems* like it will be useful.  The main (and large!) downside is that
it adds to the user interface so we'll have to maintain this feature
and its functionality for ever.

Given this, my concern is that while potentially useful, the feature
might not be *sufficiently* useful to justify its inclusion.  So we'll
end up addressing these issues by other means, then we're left
maintaining this obsolete legacy feature.

So I'm thinking that unless someone can show that this is good and
complete and sufficient for a "large enough" set of issues, I'll take a
pass on the patch[1].  What do people think?


[1] Actually, I'll stick it in -mm and maintain it, so next time
someone reports an issue I can say "hey, try this".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmzone.h: constify some zone access functions

2013-11-22 Thread Luiz Capitulino
On Fri, 22 Nov 2013 14:30:47 -0800
Andrew Morton  wrote:

> On Fri, 22 Nov 2013 12:01:06 -0500 Luiz Capitulino  
> wrote:
> 
> > Signed-off-by: Luiz capitulino 
> > ---
> >  include/linux/mmzone.h | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index bd791e4..5e202d6 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -560,12 +560,12 @@ static inline bool zone_spans_pfn(const struct zone 
> > *zone, unsigned long pfn)
> > return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
> >  }
> >  
> > -static inline bool zone_is_initialized(struct zone *zone)
> > +static inline bool zone_is_initialized(const struct zone *zone)
> >  {
> > return !!zone->wait_table;
> >  }
> >  
> > -static inline bool zone_is_empty(struct zone *zone)
> > +static inline bool zone_is_empty(const struct zone *zone)
> >  {
> > return zone->spanned_pages == 0;
> >  }
> > @@ -843,7 +843,7 @@ unsigned long __init node_memmap_size_bytes(int, 
> > unsigned long, unsigned long);
> >   */
> >  #define zone_idx(zone) ((zone) - 
> > (zone)->zone_pgdat->node_zones)
> >  
> > -static inline int populated_zone(struct zone *zone)
> > +static inline int populated_zone(const struct zone *zone)
> >  {
> > return (!!zone->present_pages);
> >  }
> 
> hm, why?  I counted ten similarly constifyable functions in mm.h and
> stopped only 1/4 of the way through. What's so special about these three?

I spotted them while reading code. If you want to me convert the others,
I can do that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] ACPI: Add support to force header inclusion rules for .

2013-11-22 Thread Lv Zheng
From: Lv Zheng 

As there is only CONFIG_ACPI=n processing in the , it is not
safe to include ,  and 
directly for source out of Linux ACPI subsystems.

This patch adds error messaging to warn developers of such wrong
inclusions.

In order not to be bisected and reverted as a wrong commit, warning
messages are carefully split into a seperate patch other than the wrong
inclusion cleanups.

Signed-off-by: Lv Zheng 
---
 drivers/acpi/acpica/Makefile|2 +-
 include/acpi/acpi_bus.h |5 +
 include/acpi/acpi_drivers.h |5 +
 include/acpi/platform/aclinux.h |   10 ++
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpica/Makefile b/drivers/acpi/acpica/Makefile
index 4383040..7738677 100644
--- a/drivers/acpi/acpica/Makefile
+++ b/drivers/acpi/acpica/Makefile
@@ -2,7 +2,7 @@
 # Makefile for ACPICA Core interpreter
 #
 
-ccflags-y  := -Os
+ccflags-y  := -Os -DLINUXIZED_ACPICA
 ccflags-$(CONFIG_ACPI_DEBUG)   += -DACPI_DEBUG_OUTPUT
 
 # use acpi.o to put all files here into acpi.o modparam namespace
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 0af9667..0b1ea80 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -26,6 +26,11 @@
 #ifndef __ACPI_BUS_H__
 #define __ACPI_BUS_H__
 
+/*  is not safe for CONFIG_ACPI=n environment */
+#ifndef _LINUX_ACPI_H
+#error "Please don't include  direclty, including 
 instead."
+#endif
+
 #include 
 
 
diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
index f3f1219..f5bed3a 100644
--- a/include/acpi/acpi_drivers.h
+++ b/include/acpi/acpi_drivers.h
@@ -26,6 +26,11 @@
 #ifndef __ACPI_DRIVERS_H__
 #define __ACPI_DRIVERS_H__
 
+/*  is not safe for CONFIG_ACPI=n environment */
+#ifndef _LINUX_ACPI_H
+#error "Please don't include  directly, including 
 instead."
+#endif
+
 
 #define ACPI_MAX_STRING80
 
diff --git a/include/acpi/platform/aclinux.h b/include/acpi/platform/aclinux.h
index 28f4f4d..0e05771 100644
--- a/include/acpi/platform/aclinux.h
+++ b/include/acpi/platform/aclinux.h
@@ -44,6 +44,16 @@
 #ifndef __ACLINUX_H__
 #define __ACLINUX_H__
 
+#ifdef __KERNEL__
+
+/* ACPICA external files should not include ACPICA headers directly. */
+
+#if !defined(LINUXIZED_ACPICA) && !defined(_LINUX_ACPI_H)
+#error "Please don't include  directly, including  
instead."
+#endif
+
+#endif
+
 /* Common (in-kernel/user-space) ACPICA configuration */
 
 #define ACPI_USE_SYSTEM_CLIBRARY
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] ACPICA: Cleanup inclusions.

2013-11-22 Thread Lv Zheng
From: Lv Zheng 

This patch cleans up  inclusions.

1. For files where  is already included, this patch cleans up
   them by deleting  inclusions.  Note that if 
   is included before including  in the original file, the
   inclusion of  is moved up to avoid issues from happening.
2. The sfi_acpi files have invoked acpi_table_parse(), so this patch
   converts  inclusions into  inclusions so that
   they can find the acpi_parse_table() prototypes.
3. For files where only  included, this patch cleans up them
   by replacing  inclusions with .  Such files
   are those include ACPICA headers to find table definitions or ACPICA
   basic types.
4. For files where no actual ACPICA references, this patch cleans up them
   by deleting  inclusions.

Signed-off-by: Lv Zheng 
---
 arch/x86/pci/mmconfig_32.c |1 -
 drivers/acpi/apei/einj.c   |1 -
 drivers/acpi/osl.c |1 -
 drivers/char/tpm/tpm_acpi.c|2 +-
 drivers/gpu/drm/nouveau/nouveau_acpi.c |1 -
 drivers/hwmon/asus_atk0110.c   |1 -
 drivers/ide/ide-acpi.c |1 -
 drivers/iommu/amd_iommu_init.c |1 -
 drivers/iommu/intel_irq_remapping.c|2 +-
 drivers/pci/hotplug/pciehp.h   |1 -
 drivers/pci/pci-acpi.c |1 -
 drivers/platform/x86/tc1100-wmi.c  |1 -
 drivers/sfi/sfi_acpi.c |2 +-
 include/linux/acpi_io.h|2 +-
 include/linux/ide.h|3 ---
 include/linux/iscsi_ibft.h |2 +-
 include/linux/pci_hotplug.h|1 -
 include/linux/sfi_acpi.h   |2 +-
 include/linux/tboot.h  |2 +-
 19 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 5c90975..43984bc 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 
 /* Assume systems with more busses have correct MCFG */
 #define mmcfg_virt_addr ((void __iomem *) fix_to_virt(FIX_PCIE_MCFG))
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index fb57d03..ca0c6d7 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -33,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "apei-internal.h"
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 8507034..c4a0e45 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -49,7 +49,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include "internal.h"
 
diff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
index 64420b3..b9a57fa 100644
--- a/drivers/char/tpm/tpm_acpi.c
+++ b/drivers/char/tpm/tpm_acpi.c
@@ -23,7 +23,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "tpm.h"
 #include "tpm_eventlog.h"
diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
b/drivers/gpu/drm/nouveau/nouveau_acpi.c
index 6763865..5910ef5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -2,7 +2,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
diff --git a/drivers/hwmon/asus_atk0110.c b/drivers/hwmon/asus_atk0110.c
index 9b64d26..eea1f60 100644
--- a/drivers/hwmon/asus_atk0110.c
+++ b/drivers/hwmon/asus_atk0110.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 
-#include 
 #include 
 
 
diff --git a/drivers/ide/ide-acpi.c b/drivers/ide/ide-acpi.c
index d9e1f7c..333d405 100644
--- a/drivers/ide/ide-acpi.c
+++ b/drivers/ide/ide-acpi.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 8f798be..28b4bea 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index bab10b1..01c1dd9 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -10,7 +10,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index f773254..fc51cab 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -163,7 +163,6 @@ static inline const char *slot_name(struct slot *slot)
 }
 
 #ifdef CONFIG_ACPI
-#include 
 #include 
 
 void __init pciehp_acpi_slot_detection_init(void);
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index d0e4999..4c77380 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/drivers/platform/x86/tc1100-wmi.c 
b/drivers/platform/x86/tc1100-wmi.c
index d16193e..6a6ea28 100644
--- a/drivers/platform/x86/tc1100-wmi.c

[PATCH 2/4] ACPI: Cleanup and inclusions.

2013-11-22 Thread Lv Zheng
From: Lv Zheng 

This patch enfoces  inclusion instead of direct
 and  inclusions.

It's required that  included prior than
 to find declarations for acpi_pci_root.
It is required that  included prior than 
to find declarations for acpi_handle and acpi_notify_handler.

In order to work around this, there are  and 
inclusions in the .
This messes up the inclusion order in the following style:

  +--
  |
  |  +--
  |  |
  |  |  +-- if CONFIG_ACPI=y
  |  |  |
  |  |  +->Included if CONFIG_ACPI=y
  |  |
  |  |  +-- if CONFIG_ACPI=y
  |  |  |
  |  |  |  +-- if CONFIG_ACPI=y and CONFIG_ACPI=n
  |  |  |  |
  |  |  |  +->Excluded
  |  |  |
  |  |  +->Included if CONFIG_ACPI=y
  |  |
  |  |  +-- if CONFIG_ACPI=y
  |  |  |
  |  |  +->Excluded
  |  |
  |  +->Included if CONFIG_ACPI=y and CONFIG_ACPI=n
  |
  |  +--
  |  |
  |  |  +-- if CONFIG_ACPI=y and CONFIG_ACPI=n
  |  |  |
  |  |  +->Inlcuded if CONFIG_ACPI=n
  |  |
  |  +->Included if CONFIG_ACPI=n
  |
  +->Included if CONFIG_ACPI=y and CONFIG_ACPI=n

  Figure 1.  Current ACPI Headers Inclusion Order

When CONFIG_ACPI is disabled, the result inclusion order of
"#include  turns out to be:

  
  
  
  

   Figure 2.  ACPI Headers Inclusion Order CONFIG_ACPI=n

When CONFIG_ACPI is enabled, the result inclusion order of
"#include  turns out to be:

  
  
  
  

   Figure 3.  ACPI Headers Inclusion Order CONFIG_ACPI=n

In order to follow the type dependencies, the best order should be:
  
  
   or 
 

   Figure 4.  Best ACPI Headers Inclusion Order

By investigation, currently  are all included for
CONFIG_ACPI=y, thus this is not a real issue but would prevent further
header cleanup work from being achieved.

If we just include  in the  and
 to solve header inclusion order issue, it will easily
form a wrong inclusion order, and a build test would end with the
following error messages:
:1220:2: warning: #warning syscall kcmp not implemented [-Wcpp]
:1223:2: warning: #warning syscall finit_module not implemented [-Wcpp]
In file included from include/linux/acpi.h:38:0,
 from include/acpi/acpi_bus.h:30,
 from drivers/acpi/ac.c:35:
include/acpi/acpi_drivers.h:101:43: warning: 'struct acpi_pci_root' declared 
inside parameter list [enabled by default]
include/acpi/acpi_drivers.h:101:43: warning: its scope is only this definition 
or declaration, which is probably not what you want [enabled by default]
In file included from include/acpi/acpi_bus.h:30:0,
 from drivers/acpi/ac.c:35:
include/linux/acpi.h:272:7: warning: 'struct acpi_device' declared inside 
parameter list [enabled by default]

So this patch cleans up direct  and 
inclusion by replacing them with  inclusion and removing
 and  inclusion from the
 so that the inclusion order is:

  
  
  
  

   Figure 5.  ACPI Headers Inclusion Order CONFIG_ACPI=n

This matches best known ACPI header inclusion order in the Figure 4.

Note that, in some files that already include , this patch
deletes  and  for them.  If
 inclusion is after original  and
 inclusions, it is moved up so that other ACPI header
files are able to find the definitions of the ACPICA types.

Currently  is used to put prototypes that are referencing
ACPICA defined types.  As such types are not safe for "#ifndef CONFIG_ACPI"
code blocks,  includes this file with "#ifdef CONFIG_ACPI"
protections.  Directly including  should only apply to
ACPI dependent code while such code can also include .  This
patch tries to replace  inclusions with 
inclusions or delete  inclusions if there is already
 included for such files.  When deleting, we carefully put
 inclusions before any other ACPI header file inclusions so
that they are able to find ACPICA defined types.

The actual cleanups are done in the following style:

1. CONFIG_ACPI dependent modules, where  is already included.
   We could simply cleanup them by removing  and
inclusion.
2. Not CONFIG_ACPI dependent modules, where  is already
   included.  As there isn't code pieces surrounded by "#ifdef CONFIG_ACPI"
   we can simply cleanup them by removing  and
   .
3. CONFIG_ACPI dependent modules, where  is not included.  We
   can simply cleanup them by replacing  and
with .
4. CONFIG_ACPI enabled code, but only ACPICA types are referenced, we can
   simply cleanup them by replacing  and
with .  We carefully protect such
   inclusions with "#ifdef CONFIG_ACPI" undeleted.

Tests are done by enabling the modules listed in this patch, build the
kernel with and without this patch applied.  The build output with static
checkers enabled shows only line number differences.

Signed-off-by: Lv Zheng 
---
 arch/ia64/hp/common/aml_nfw.c|3 +--
 arch/x86/kernel/apic/apic_flat_64.c  |4 +---
 arch/x86/kernel/apic/io_apic.c   |3 ---
 arch/x86/platform/olpc/olpc-xo15-sci.c   |3 +--
 drivers/acpi/ac.c  

[PATCH 1/4] ACPICA: Linux: Cleanup wrong ACPICA inclusions.

2013-11-22 Thread Lv Zheng
From: Lv Zheng 

The header file  is the top level header for other kernel
subsystems to include.  It contains environemnt settings and architecture
specific implementation that should be included before including other
ACPICA headers in order to keep a consistent build environment for ACPICA
users.  All following internal ACPICA header files thus should be included
from  and not included by other kernel files:
  
  
  
  
  
  
  
  
  : included by 
  
  
  : included by 
  : included by 
  : included by 
  
  : included by 
This patch cleans up inclusions of above files from non ACPICA source
files.

Signed-off-by: Lv Zheng 
---
 drivers/acpi/nvs.c   |1 -
 drivers/hwmon/asus_atk0110.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/acpi/nvs.c b/drivers/acpi/nvs.c
index 266bc58..386a9fe 100644
--- a/drivers/acpi/nvs.c
+++ b/drivers/acpi/nvs.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 
 /* ACPI NVS regions, APEI may use it */
 
diff --git a/drivers/hwmon/asus_atk0110.c b/drivers/hwmon/asus_atk0110.c
index 1d7ff46..dafc63c 100644
--- a/drivers/hwmon/asus_atk0110.c
+++ b/drivers/hwmon/asus_atk0110.c
@@ -18,7 +18,6 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] ACPI: Cleanup header inclusions.

2013-11-22 Thread Lv Zheng
From: Lv Zheng 

ACPICA header files are lacking in stub protections.  This makes all ACPICA
header files could only be included for CONFIG_ACPI=y environment.  This
infects new ACPI prototypes, if such prototypes referenced ACPICA defined
types, they must be incldued only in the CONFIG_ACPI=y environment.  The
 and  are the monstrosity infants
born under this background, directly including them in the CONFIG_ACPI=n
environment is not safe.  The infection spreads to whole kernel source tree
that we can see "#ifdef CONFIG_ACPI" here and there.

This patchset tries to enforces  inclusions instead of
,  and  inclusions.

Lv Zheng (4):
  ACPICA: Linux: Cleanup wrong ACPICA inclusions.
  ACPI: Cleanup  and  inclusions.
  ACPICA: Cleanup  inclusions.
  ACPI: Add support to force header inclusion rules for .

 arch/ia64/hp/common/aml_nfw.c|3 +--
 arch/x86/kernel/apic/apic_flat_64.c  |4 +---
 arch/x86/kernel/apic/io_apic.c   |3 ---
 arch/x86/pci/mmconfig_32.c   |1 -
 arch/x86/platform/olpc/olpc-xo15-sci.c   |3 +--
 drivers/acpi/ac.c|3 +--
 drivers/acpi/acpi_extlog.c   |1 -
 drivers/acpi/acpi_pad.c  |3 +--
 drivers/acpi/acpica/Makefile |2 +-
 drivers/acpi/apei/einj.c |1 -
 drivers/acpi/battery.c   |3 +--
 drivers/acpi/blacklist.c |1 -
 drivers/acpi/bus.c   |2 --
 drivers/acpi/button.c|3 +--
 drivers/acpi/custom_method.c |2 +-
 drivers/acpi/debugfs.c   |2 +-
 drivers/acpi/dock.c  |2 --
 drivers/acpi/ec.c|3 +--
 drivers/acpi/event.c |2 +-
 drivers/acpi/fan.c   |3 +--
 drivers/acpi/hed.c   |2 --
 drivers/acpi/numa.c  |1 -
 drivers/acpi/nvs.c   |1 -
 drivers/acpi/osl.c   |2 --
 drivers/acpi/pci_irq.c   |2 --
 drivers/acpi/pci_link.c  |3 +--
 drivers/acpi/pci_root.c  |2 --
 drivers/acpi/power.c |3 +--
 drivers/acpi/proc.c  |3 +--
 drivers/acpi/processor_core.c|2 +-
 drivers/acpi/processor_idle.c|1 -
 drivers/acpi/processor_perflib.c |3 +--
 drivers/acpi/processor_thermal.c |3 +--
 drivers/acpi/processor_throttling.c  |3 +--
 drivers/acpi/sbshc.c |3 +--
 drivers/acpi/scan.c  |1 -
 drivers/acpi/sleep.c |2 --
 drivers/acpi/sysfs.c |2 +-
 drivers/acpi/thermal.c   |3 +--
 drivers/acpi/utils.c |2 --
 drivers/acpi/video.c |3 +--
 drivers/acpi/wakeup.c|1 -
 drivers/ata/libata-acpi.c|1 -
 drivers/ata/pata_acpi.c  |2 +-
 drivers/char/hpet.c  |1 -
 drivers/char/tpm/tpm_acpi.c  |2 +-
 drivers/char/tpm/tpm_ppi.c   |1 -
 drivers/gpu/drm/i915/intel_acpi.c|1 -
 drivers/gpu/drm/nouveau/nouveau_acpi.c   |3 ---
 drivers/gpu/drm/radeon/radeon_acpi.c |2 --
 drivers/hv/vmbus_drv.c   |1 -
 drivers/hwmon/acpi_power_meter.c |3 +--
 drivers/hwmon/asus_atk0110.c |5 +
 drivers/ide/ide-acpi.c   |1 -
 drivers/input/misc/atlas_btns.c  |2 +-
 drivers/iommu/amd_iommu_init.c   |1 -
 drivers/iommu/intel_irq_remapping.c  |2 +-
 drivers/pci/hotplug/acpiphp_ibm.c|1 -
 drivers/pci/hotplug/pciehp.h |2 --
 drivers/pci/ioapic.c |1 -
 drivers/pci/pci-acpi.c   |2 --
 drivers/pci/pci-label.c  |1 -
 drivers/platform/x86/acer-wmi.c  |1 -
 drivers/platform/x86/asus-laptop.c   |3 +--
 drivers/platform/x86/asus-wmi.c  |3 +--
 

Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-22 Thread Andrew Morton
On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara  wrote:

> On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > > On Fri, 8 Nov 2013 00:21:51 +0100
> > > Frederic Weisbecker  wrote:
> > > > 
> > > > Offloading to a workqueue would be perhaps better, and writing to the 
> > > > serial
> > > > console could then be done with interrupts enabled, preemptible 
> > > > context, etc...
> > > 
> > > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > > nightmare of much more complexity for a critical kernel infrastructure.
> > 
> > But yeah that's scary, that means workqueues itself can't printk that 
> > safely.
> > So, you're right after all.
>   Yeah, we've been there (that was actually my initial proposal). But
> Andrew and Steven (rightfully) objected and suggested irq_work should be
> used instead.

I still hate the patchset and so does everyone else, including you ;)
There must be something smarter we can do.  Let's start by restating
the problem:

CPU A is in printk, emitting log_buf characters to a slow device. 
Meanwhile other CPUs come into printk(), see that the system is busy,
dump their load into log_buf then scram, leaving CPU A to do even more
work.

Correct so far?

If so, what is the role of local_irq_disabled() in this?  Did CPU A
call printk() with local interrupts disabled, or is printk (or the
console driver) causing the irqs-off condition?  Where and why is this
IRQ disablement happening?


Could we fix this problem by not permitting CPUs B, C and D to DoS CPU
A?  When CPU B comes into printk() and sees that printk is busy, make
CPU A hand over to CPU B and let CPU A get out of there?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] double free in decompressor.c

2013-11-22 Thread Phillip Lougher

On 22/11/13 21:50, Geyslan Gregório Bem wrote:

Coverity caught double free possibility (CID 1130962).

I can patch this, but I have to know if is correct to free comp_opts
in the function squashfs_decompressor_create() or it had to be done in
the caller. My bet is the caller.


128void *squashfs_decompressor_setup(struct super_block *sb, unsigned
short flags)
129{
130struct squashfs_sb_info *msblk = sb->s_fs_info;
131void *stream, *comp_opts = get_comp_opts(sb, flags);
132

1. Condition "IS_ERR(comp_opts)", taking false branch
133if (IS_ERR(comp_opts))
134return comp_opts;
135

2. freed_arg: "squashfs_decompressor_create(struct squashfs_sb_info *,
void *)" frees "comp_opts".[show details]
136stream = squashfs_decompressor_create(msblk, comp_opts);

3. Condition "IS_ERR(stream)", taking true branch
137if (IS_ERR(stream))


FALSE positive.

squashfs_decompressor_create() frees comp_opts only on success.

If IS_ERR(stream) is true, then comp_opts has not been freed by
squashfs_decompressor_create().

Phillip




CID 1130962 (#1 of 1): Double free (USE_AFTER_FREE)4. double_free:
Calling "kfree(void const *)" frees pointer "comp_opts" which has
already been freed.
138kfree(comp_opts);
139
140return stream;
141}




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] e1000: fix possible reset_task running after adapter down

2013-11-22 Thread Jeff Kirsher
On Fri, 2013-11-22 at 12:20 +0400, Vladimir Davydov wrote:
> On e1000_down(), we should ensure every asynchronous work is canceled
> before proceeding. Since the watchdog_task can schedule other works
> apart from itself, it should be stopped first, but currently it is
> stopped after the reset_task. This can result in the following race
> leading to the reset_task running after the module unload:
> 
> e1000_down_and_stop():  e1000_watchdog():
> --  -
> 
> cancel_work_sync(reset_task)
> schedule_work(reset_task)
> cancel_delayed_work_sync(watchdog_task)
> 
> The patch moves cancel_delayed_work_sync(watchdog_task) at the
> beginning
> of e1000_down_and_stop() thus ensuring the race is impossible.
> 
> Signed-off-by: Vladimir Davydov 
> Cc: Tushar Dave 
> Cc: Patrick McHardy 
> Cc: David S. Miller 
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c |   15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)

I have applied your patch to my queue, thanks!


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 1/1] workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY

2013-11-22 Thread Tejun Heo
On Thu, Nov 14, 2013 at 6:56 AM, Oleg Nesterov  wrote:
> Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
> in create_worker(). Otherwise userland can change ->cpus_allowed
> in between.
>
> Signed-off-by: Oleg Nesterov 

Applied to wq/for-3.13-fixes.

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] e1000: fix lockdep warning in e1000_reset_task

2013-11-22 Thread Jeff Kirsher
On Fri, 2013-11-22 at 12:20 +0400, Vladimir Davydov wrote:
> The patch fixes the following lockdep warning, which is 100%
> reproducible on network restart:
> 
> ==
> [ INFO: possible circular locking dependency detected ]
> 3.12.0+ #47 Tainted: GF
> ---
> kworker/1:1/27 is trying to acquire lock:
>  ((&(>watchdog_task)->work)){+.+...}, at:
> [] flush_work+0x0/0x70
> 
> but task is already holding lock:
>  (>mutex){+.+...}, at: [] e1000_reset_task
> +0x4a/0xa0 [e1000]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (>mutex){+.+...}:
>[] lock_acquire+0x9d/0x120
>[] mutex_lock_nested+0x4c/0x390
>[] e1000_watchdog+0x7d/0x5b0 [e1000]
>[] process_one_work+0x1d2/0x510
>[] worker_thread+0x120/0x3a0
>[] kthread+0xee/0x110
>[] ret_from_fork+0x7c/0xb0
> 
> -> #0 ((&(>watchdog_task)->work)){+.+...}:
>[] __lock_acquire+0x1710/0x1810
>[] lock_acquire+0x9d/0x120
>[] flush_work+0x3b/0x70
>[] __cancel_work_timer+0x98/0x140
>[] cancel_delayed_work_sync+0x13/0x20
>[] e1000_down_and_stop+0x3c/0x60 [e1000]
>[] e1000_down+0x131/0x220 [e1000]
>[] e1000_reset_task+0x52/0xa0 [e1000]
>[] process_one_work+0x1d2/0x510
>[] worker_thread+0x120/0x3a0
>[] kthread+0xee/0x110
>[] ret_from_fork+0x7c/0xb0
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>CPU0CPU1
>
>   lock(>mutex);
> 
> lock((&(>watchdog_task)->work));
>lock(>mutex);
>   lock((&(>watchdog_task)->work));
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/1:1/27:
>  #0:  (events){.+.+.+}, at: [] process_one_work
> +0x166/0x510
>  #1:  ((>reset_task)){+.+...}, at: []
> process_one_work+0x166/0x510
>  #2:  (>mutex){+.+...}, at: []
> e1000_reset_task+0x4a/0xa0 [e1000]
> 
> stack backtrace:
> CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF3.12.0+ #47
> Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS
> 050105/31/2007
> Workqueue: events e1000_reset_task [e1000]
>  820f6000 88007b9dba98 816b54a2 0002
>  820f5e50 88007b9dbae8 810ba936 88007b9dbac8
>  88007b9dbb48 88007b9d8f00 88007b9d8780 88007b9d8f00
> Call Trace:
>  [] dump_stack+0x49/0x5f
>  [] print_circular_bug+0x216/0x310
>  [] __lock_acquire+0x1710/0x1810
>  [] ? __flush_work+0x250/0x250
>  [] lock_acquire+0x9d/0x120
>  [] ? __flush_work+0x250/0x250
>  [] flush_work+0x3b/0x70
>  [] ? __flush_work+0x250/0x250
>  [] __cancel_work_timer+0x98/0x140
>  [] cancel_delayed_work_sync+0x13/0x20
>  [] e1000_down_and_stop+0x3c/0x60 [e1000]
>  [] e1000_down+0x131/0x220 [e1000]
>  [] e1000_reset_task+0x52/0xa0 [e1000]
>  [] process_one_work+0x1d2/0x510
>  [] ? process_one_work+0x166/0x510
>  [] worker_thread+0x120/0x3a0
>  [] ? manage_workers+0x2c0/0x2c0
>  [] kthread+0xee/0x110
>  [] ? __init_kthread_worker+0x70/0x70
>  [] ret_from_fork+0x7c/0xb0
>  [] ? __init_kthread_worker+0x70/0x70
> 
> == The issue background ==
> 
> The problem occurs, because e1000_down(), which is called under
> adapter->mutex by e1000_reset_task(), tries to synchronously cancel
> e1000 auxiliary works (reset_task, watchdog_task, phy_info_task,
> fifo_stall_task), which take adapter->mutex in their handlers. So the
> question is what does adapter->mutex protect there?
> 
> The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to
> private mutex from rtnl") as a replacement for rtnl_lock() taken in
> the
> asynchronous handlers. It targeted on fixing a similar lockdep warning
> issued when e1000_down() was called under rtnl_lock(), and it fixed
> it,
> but unfortunately it introduced the lockdep warning described above.
> Anyway, that said the source of this bug is that the asynchronous
> works
> were made to take rtnl_lock() some time ago, so let's look deeper and
> find why it was added there.
> 
> The rtnl_lock() was added to asynchronous handlers by commit 338c15
> ("e1000: fix occasional panic on unload") in order to prevent
> asynchronous handlers from execution after the module is unloaded
> (e1000_down() is called) as it follows from the comment to the commit:
> 
> > Net drivers in general have an issue where timers fired
> > by mod_timer or work threads with schedule_work are running
> > outside of the rtnl_lock.
> >
> > With no other lock protection these routines are vulnerable
> > to races with driver unload or reset paths.
> >
> > The longer term solution to this might be a redesign with
> > safer locks being taken in the driver to guarantee no
> > reentrance, but for now a safe and effective fix is
> > to take the rtnl_lock in these routines.
> 
> I'm not sure if this 

Re: [PATCH] ACPI / scan: User direct recurrence for device hierarchy walks

2013-11-22 Thread Rafael J. Wysocki
On Saturday, November 23, 2013 12:22:23 AM Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Rework acpi_bus_trim() and acpi_bus_device_attach(), which is
> renamed as acpi_bus_attach(), to walk the list of each device
> object's children directly and call themselves recursively for
> each child instead of using acpi_walk_namespace().  This
> simplifies the code quite a bit and avoids the overhead of
> callbacks and the ACPICA's internal processing which are not
> really necessary for these two routines.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> Hi,
> 
> This is on top of the linux-pm/bleeding-edge branch that contains the series
> I posted previously: http://marc.info/?l=linux-pci=138470560909690=4

The subject should be "ACPI / scan: Use direct recurrence for device hierarchy
walks", sorry for the typo.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ACPI / scan: User direct recurrence for device hierarchy walks

2013-11-22 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Rework acpi_bus_trim() and acpi_bus_device_attach(), which is
renamed as acpi_bus_attach(), to walk the list of each device
object's children directly and call themselves recursively for
each child instead of using acpi_walk_namespace().  This
simplifies the code quite a bit and avoids the overhead of
callbacks and the ACPICA's internal processing which are not
really necessary for these two routines.

Signed-off-by: Rafael J. Wysocki 
---

Hi,

This is on top of the linux-pm/bleeding-edge branch that contains the series
I posted previously: http://marc.info/?l=linux-pci=138470560909690=4

Thanks,
Rafael

---
 drivers/acpi/scan.c |  120 +++-
 1 file changed, 45 insertions(+), 75 deletions(-)

Index: linux-pm/drivers/acpi/scan.c
===
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1909,54 +1909,40 @@ static int acpi_scan_attach_handler(stru
return ret;
 }
 
-static acpi_status acpi_bus_device_attach(acpi_handle handle, u32 lvl_not_used,
- void *not_used, void **ret_not_used)
+static void acpi_bus_attach(struct acpi_device *device)
 {
-   struct acpi_device *device;
-   unsigned long long sta;
+   struct acpi_device *child;
int ret;
 
-   /*
-* Ignore errors ignored by acpi_bus_check_add() to avoid terminating
-* namespace walks prematurely.
-*/
-   if (acpi_bus_type_and_status(handle, , ))
-   return AE_OK;
-
-   if (acpi_bus_get_device(handle, ))
-   return AE_CTRL_DEPTH;
-
-   acpi_set_device_status(device, sta);
+   acpi_bus_get_status(device);
/* Skip devices that are not present. */
-   if (!acpi_device_is_present(device))
-   goto err;
-
+   if (!acpi_device_is_present(device)) {
+   device->flags.visited = false;
+   return;
+   }
if (device->handler)
-   return AE_OK;
+   goto ok;
 
if (!device->flags.initialized) {
acpi_bus_update_power(device, NULL);
device->flags.initialized = true;
}
+   device->flags.visited = false;
ret = acpi_scan_attach_handler(device);
if (ret < 0)
-   goto err;
+   return;
 
device->flags.match_driver = true;
-   if (ret > 0)
-   goto ok;
-
-   ret = device_attach(>dev);
-   if (ret < 0)
-   goto err;
-
- ok:
+   if (!ret) {
+   ret = device_attach(>dev);
+   if (ret < 0)
+   return;
+   }
device->flags.visited = true;
-   return AE_OK;
 
- err:
-   device->flags.visited = false;
-   return AE_CTRL_DEPTH;
+ ok:
+   list_for_each_entry(child, >children, node)
+   acpi_bus_attach(child);
 }
 
 /**
@@ -1976,64 +1962,48 @@ static acpi_status acpi_bus_device_attac
 int acpi_bus_scan(acpi_handle handle)
 {
void *device = NULL;
-   int error = 0;
 
if (ACPI_SUCCESS(acpi_bus_check_add(handle, 0, NULL, )))
acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
acpi_bus_check_add, NULL, NULL, );
 
-   if (!device)
-   error = -ENODEV;
-   else if (ACPI_SUCCESS(acpi_bus_device_attach(handle, 0, NULL, NULL)))
-   acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
-   acpi_bus_device_attach, NULL, NULL, NULL);
-
-   return error;
-}
-EXPORT_SYMBOL(acpi_bus_scan);
-
-static acpi_status acpi_bus_device_detach(acpi_handle handle, u32 lvl_not_used,
- void *not_used, void **ret_not_used)
-{
-   struct acpi_device *device = NULL;
-
-   if (!acpi_bus_get_device(handle, )) {
-   struct acpi_scan_handler *dev_handler = device->handler;
-
-   if (dev_handler) {
-   if (dev_handler->detach)
-   dev_handler->detach(device);
-
-   device->handler = NULL;
-   } else {
-   device_release_driver(>dev);
-   }
-   /*
-* Most likely, the device is going away, so put it into D3cold
-* before that.
-*/
-   acpi_device_set_power(device, ACPI_STATE_D3_COLD);
-   device->flags.initialized = false;
-   device->flags.visited = false;
+   if (device) {
+   acpi_bus_attach(device);
+   return 0;
}
-   return AE_OK;
+   return -ENODEV;
 }
+EXPORT_SYMBOL(acpi_bus_scan);
 
 /**
- * acpi_bus_trim - Remove ACPI device node and all of its descendants
- * @start: Root of the ACPI device nodes subtree to remove.
+ * acpi_bus_trim - Detach scan 

Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

2013-11-22 Thread Mel Gorman
On Fri, Nov 22, 2013 at 03:28:07PM -0600, Alex Thorlton wrote:
> > If the warning added by that patch does *not* trigger than can you also
> > test this patch? It removes the barriers which should not be necessary
> > and takes a reference tot he page before waiting on the lock. The
> > previous version did not take the reference because otherwise the
> > WARN_ON could not distinguish between a migration waiter and a surprise
> > gup.
> 
> Sorry for the delay; been a bit busy.  I tested both of these patches on
> top of this one (separately, of course):
> 
> http://www.spinics.net/lists/linux-mm/msg63919.html
> 
> I think that's the one you were referring to, if not send me a pointer
> to the correct one and I'll give it another shot.  Both patches still
> segfaulted, so it doesn't appear that either of these solved the
> problem. 

I see. Does THP have to be enabled or does it segfault even with THP
disabled?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] e1000: prevent oops when adapter is being closed and reset simultaneously

2013-11-22 Thread Jeff Kirsher
On Thu, Nov 21, 2013 at 11:54 PM, yzhu1  wrote:
> Hi, maintainers
>
> This change is based on a similar change made to e1000e support in
> commit bb9e44d0d0f4 ("e1000e: prevent oops when adapter is being closed
> and reset simultaneously").  The same issue has also been observed
> on the older e1000 cards.
>
> Here, we have increased the RESET_COUNT value to 50 because there are too
> many accesses to e1000 nic on stress tests to e1000 nic, it is not enough
> to set RESET_COUT 25. Experimentation has shown that it is enough to set
> RESET_COUNT 50.
>
> Please help me to merge this patch. Thanks a lot.
>
> Zhu Yanjun

Please remember to actually include the driver maintainer for the
driver your patch is against (i.e. me).

I will add your patch to my queue.

-- 
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] percpu: stop the loop when a cpu belongs to a new group

2013-11-22 Thread Tejun Heo
Hello,

On Wed, Nov 20, 2013 at 12:51:21AM -0500, Tejun Heo wrote:
> The patch is just extremely marginal.  Ah well... why not?  I'll apply
> it once -rc1 drops.

So, I was about to apply this patch but decided against it.  It
doesn't really make anything better and the code looks worse
afterwards.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] mm, memory-failure: fix the typo in me_pagecache_dirty()

2013-11-22 Thread Andrew Morton
On Fri,  8 Nov 2013 20:47:38 +0800 Zhi Yong Wu  wrote:

> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -611,7 +611,7 @@ static int me_pagecache_clean(struct page *p, unsigned 
> long pfn)
>  }
>  
>  /*
> - * Dirty cache page page
> + * Dirty cache page
>   * Issues: when the error hit a hole page the error is not properly
>   * propagated.
>   */

The accurate and complete description of this page is actually
"pagecache page", so...

--- 
a/mm/memory-failure.c~mm-memory-failure-fix-the-typo-in-me_pagecache_dirty-fix
+++ a/mm/memory-failure.c
@@ -611,7 +611,7 @@ static int me_pagecache_clean(struct pag
 }
 
 /*
- * Dirty cache page
+ * Dirty pagecache page
  * Issues: when the error hit a hole page the error is not properly
  * propagated.
  */
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] Input updates for 3.13-rc0 (2nd)

2013-11-22 Thread Dmitry Torokhov
Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus
or
master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive 2nd round of updates for the input subsystem. You will get a
new driver for Surface 2.0/Pixelsense touchscreen and a couple of driver
fixups.

Changelog:
-

Dan Carpenter (1):
  Input: hp_sdc_rtc - unlock on error in hp_sdc_rtc_read_i8042timer()

Dmitry Torokhov (1):
  Input: atmel-wm97xx - fix compile error

Florian Echtler (1):
  Input: add sur40 driver for Samsung SUR40 (aka MS Surface 2.0/Pixelsense)

Haiyang Zhang (1):
  MAINTAINERS - add keyboard driver to Hyper-V file list

Xie XiuQi (1):
  Input: cyttsp4 -  remove unnecessary work pending test


Diffstat:


 drivers/input/misc/hp_sdc_rtc.c  |   5 +-
 drivers/input/touchscreen/Kconfig|  11 +
 drivers/input/touchscreen/Makefile   |   1 +
 drivers/input/touchscreen/atmel-wm97xx.c |   2 +-
 drivers/input/touchscreen/cyttsp4_core.c |   3 +-
 drivers/input/touchscreen/sur40.c| 466 +++
 6 files changed, 484 insertions(+), 4 deletions(-)
 create mode 100644 drivers/input/touchscreen/sur40.c

-- 
Dmitry



pgpVpvWDtA7dJ.pgp
Description: PGP signature


[PATCH v5 2/2] ARM: bcm281xx: watchdog configuration

2013-11-22 Thread Markus Mayer
This commit enables the watchdog driver for the BCM281xx family of SoCs.

Signed-off-by: Markus Mayer 
Reviewed-by: Matt Porter 
Acked-by: Guenter Roeck 
---
 arch/arm/configs/bcm_defconfig |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/configs/bcm_defconfig b/arch/arm/configs/bcm_defconfig
index 287ac1d..d4d0083 100644
--- a/arch/arm/configs/bcm_defconfig
+++ b/arch/arm/configs/bcm_defconfig
@@ -126,3 +126,6 @@ CONFIG_CRC_ITU_T=y
 CONFIG_CRC7=y
 CONFIG_XZ_DEC=y
 CONFIG_AVERAGE=y
+CONFIG_WATCHDOG=y
+CONFIG_BCM_KONA_WDT=y
+CONFIG_BCM_KONA_WDT_DEBUG=y
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/2] watchdog: bcm281xx: Watchdog Driver

2013-11-22 Thread Markus Mayer
This is version 5 of the watchdog driver for the BCM281xx family of mobile
SoCs.

Changes since version 4:
- Fixed whitespace issue in Kconfig "help" section
- Rebased onto 3.13-rc1 (due to merge conflict in Makefile)

Changes since version 3:
- Removed #include 
- Added Reviewed-by:/Acked-by: Guenter Roeck

Changes since version 2:

- BCM_KONA_WDT_NAME: "bcm-kona-wdt" -> "bcm_kona_wdt"
- Don't initialize static variable
- Removed '{'/'}' from single-line statements
- busy_count retains the maximum loop count of the longest loop rather
  than an over-all maximum
- Use seq_puts() in lieu of seq_printf() in 2 locations
- Removed error message after devm_kzalloc()
- Added MODULE_ALIAS

Changes since version 1:

- Added module name to "help" section in Kconfig A few cosmetic code
- simplifications and fixes Removed most dev_info() calls and changed the
- remaining ones to dev_dbg() Renamed SECWDOG_WD_LOAD_FLAG_MASK to
- SECWDOG_WD_LOAD_FLAG Added some comments to secure_register_read() and
- struct bcm_kona_wdt Added delay to secure_register_read() Reduced maximum
- retry loop from 1 to 1000 Introduced "busy_count" variable to count
- how often secure_register_read() gets stalled; this is available through
- debugfs Simplified secure_register_read() to return -ETIMEDOUT rather than
- using a variable parameter to indicate a timeout error Got rid of all uses
- of -EAGAIN Fixed return value check for debugfs_create_dir() Simplified
- bcm_kona_wdt_debugfs_init() by getting rid of goto Created new generic
- function bcm_kona_wdt_ctrl_reg_modify() The following functions now use
- bcm_kona_wdt_ctrl_reg_modify():
- bcm_kona_wdt_set_resolution_reg() bcm_kona_wdt_set_timeout_reg()
- bcm_kona_wdt_stop()
- Made bcm_kona_wdt_set_timeout_reg() more generic, so bcm_kona_wdt_start()
  can use it Removed MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR)

Markus Mayer (2):
  watchdog: bcm281xx: Watchdog Driver
  ARM: bcm281xx: watchdog configuration

 arch/arm/configs/bcm_defconfig  |3 +
 drivers/watchdog/Kconfig|   22 +++
 drivers/watchdog/Makefile   |1 +
 drivers/watchdog/bcm_kona_wdt.c |  365 +++
 4 files changed, 391 insertions(+)
 create mode 100644 drivers/watchdog/bcm_kona_wdt.c

-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/2] watchdog: bcm281xx: Watchdog Driver

2013-11-22 Thread Markus Mayer
This commit adds support for the watchdog timer used on the BCM281xx
family of SoCs.

Signed-off-by: Markus Mayer 
Reviewed-by: Matt Porter 
Reviewed-by: Guenter Roeck 
---
 drivers/watchdog/Kconfig|   22 +++
 drivers/watchdog/Makefile   |1 +
 drivers/watchdog/bcm_kona_wdt.c |  365 +++
 3 files changed, 388 insertions(+)
 create mode 100644 drivers/watchdog/bcm_kona_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 5be6e91..24bc93a 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1139,6 +1139,28 @@ config BCM2835_WDT
  To compile this driver as a loadable module, choose M here.
  The module will be called bcm2835_wdt.
 
+config BCM_KONA_WDT
+   tristate "BCM Kona Watchdog"
+   depends on ARCH_BCM
+   select WATCHDOG_CORE
+   help
+ Support for the watchdog timer on the following Broadcom BCM281xx
+ family, which includes BCM11130, BCM11140, BCM11351, BCM28145 and
+ BCM28155 variants.
+
+ Say 'Y' or 'M' here to enable the driver. The module will be called
+ bcm_kona_wdt.
+
+config BCM_KONA_WDT_DEBUG
+   bool "DEBUGFS support for BCM Kona Watchdog"
+   depends on BCM_KONA_WDT
+   help
+ If enabled, adds /sys/kernel/debug/bcm-kona-wdt/info which provides
+ access to the driver's internal data structures as well as watchdog
+ timer hardware registres.
+
+ If in doubt, say 'N'.
+
 config LANTIQ_WDT
tristate "Lantiq SoC watchdog"
depends on LANTIQ
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 91bd95a..af22516 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o
 obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o
 obj-$(CONFIG_MOXART_WDT) += moxart_wdt.o
 obj-$(CONFIG_SIRFSOC_WATCHDOG) += sirfsoc_wdt.o
+obj-$(CONFIG_BCM_KONA_WDT) += bcm_kona_wdt.o
 
 # AVR32 Architecture
 obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
diff --git a/drivers/watchdog/bcm_kona_wdt.c b/drivers/watchdog/bcm_kona_wdt.c
new file mode 100644
index 000..9fe174b
--- /dev/null
+++ b/drivers/watchdog/bcm_kona_wdt.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright (C) 2013 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SECWDOG_CTRL_REG   0x
+#define SECWDOG_COUNT_REG  0x0004
+
+#define SECWDOG_RESERVED_MASK  0x1dff
+#define SECWDOG_WD_LOAD_FLAG   0x1000
+#define SECWDOG_EN_MASK0x0800
+#define SECWDOG_SRSTEN_MASK0x0400
+#define SECWDOG_RES_MASK   0x00f0
+#define SECWDOG_COUNT_MASK 0x000f
+
+#define SECWDOG_MAX_COUNT  SECWDOG_COUNT_MASK
+#define SECWDOG_CLKS_SHIFT 20
+#define SECWDOG_MAX_RES15
+#define SECWDOG_DEFAULT_RESOLUTION 4
+#define SECWDOG_MAX_TRY1000
+
+#define SECS_TO_TICKS(x, w)((x) << (w)->resolution)
+#define TICKS_TO_SECS(x, w)((x) >> (w)->resolution)
+
+#define BCM_KONA_WDT_NAME  "bcm_kona_wdt"
+
+struct bcm_kona_wdt {
+   void __iomem *base;
+   /*
+* One watchdog tick is 1/(2^resolution) seconds. Resolution can take
+* the values 0-15, meaning one tick can be 1s to 30.52us. Our default
+* resolution of 4 means one tick is 62.5ms.
+*
+* The watchdog counter is 20 bits. Depending on resolution, the maximum
+* counter value of 0xf expires after about 12 days (resolution 0)
+* down to only 32s (resolution 15). The default resolution of 4 gives
+* us a maximum of about 18 hours and 12 minutes before the watchdog
+* times out.
+*/
+   int resolution;
+   spinlock_t lock;
+#ifdef CONFIG_BCM_KONA_WDT_DEBUG
+   struct dentry *debugfs;
+#endif
+};
+
+#ifdef CONFIG_BCM_KONA_WDT_DEBUG
+static unsigned long busy_count;
+#endif
+
+static int secure_register_read(void __iomem *addr)
+{
+   uint32_t val;
+   unsigned count = 0;
+
+   /*
+* If the WD_LOAD_FLAG is set, the watchdog counter field is being
+* updated in hardware. Once the WD timer is updated in hardware, it
+* gets cleared.
+*/
+   do {
+   if (unlikely(count > 1))
+   udelay(5);
+   val = 

Re: 3.10.16 cgroup_mutex deadlock

2013-11-22 Thread William Dauchy
Hi Tejun,

On Fri, Nov 22, 2013 at 11:18 PM, Tejun Heo  wrote:
> Just applied to cgroup/for-3.13-fixes w/ stable cc'd.  Will push to
> Linus next week.

Thank your for your quick reply. Do you also have a backport for
v3.10.x already available?

Best regards,
-- 
William
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] fs: sysfs: Add devres support

2013-11-22 Thread Greg Kroah-Hartman
On Fri, Nov 22, 2013 at 02:47:38PM -0800, Dmitry Torokhov wrote:
> On Sat, Mar 16, 2013 at 09:21:40AM -0700, Greg Kroah-Hartman wrote:
> > On Thu, Mar 14, 2013 at 08:24:45PM -0700, Guenter Roeck wrote:
> > > Provide devres functions for device_create_file, sysfs_create_file,
> > > and sysfs_create_group plus the respective remove functions.
> > > 
> > > Idea is to be able to drop calls to the remove functions from the various
> > > drivers using those calls.
> > 
> > Hm, despite the fact that almost every driver that makes these calls is
> > broken?  :)
> > 
> > > Potential savings are substantial. There are more than 700 calls to
> > > device_remove_file in the kernel, more than 500 calls to 
> > > sysfs_remove_group,
> > > and some 50 calls to sysfs_remove_file (though not all of those use 
> > > dev->kobj
> > > as parameter). Expanding the API to sysfs_create_bin_file would add 
> > > another 80+
> > > opportunities, and adding sysfs_create_link would create another 100 or 
> > > so.
> > 
> > The idea is nice, but why are these drivers adding sysfs files on their
> > own?  Are they doing this in a way that is race-free with userspace
> > (i.e. creating them before userspace is told about the device), or are
> > they broken and need to have these calls added to the "default
> > device/driver/bus" attribute list for them instead?
> 
> Just stumbled upon this thread...
> 
> There is a need for drivers to add driver-specific attributes to a
> device. Since they are driver specific they can not go into bus or class
> or whatever default attributes that are created when device is
> instantiated, but rather attached to the device when a driver binds to
> them. An example would be a PS/2 mouse driver allowing user to control
> report rate and resolution of the device. Since it only applicable to
> PS/2 mice the knob does not belong to the generic serio layer/bus nor
> should it go into input layer as it is again PS/2 specific. So psmouse
> creates it while binding to a serio port.

Yeah, platform drivers also "need" this type of thing as well, so you
are right, I can't forbit it for everyone.

> Do we send a uevent when driver binds/unbinds from a device?

No, we don't, see kobject_actions[] for what we implement.

> If not I think we should so that userspace can check for additional
> attributes, if any.

We might be able to do a "change" event for the device itself after it
has been bound / unbound, but I don't know what userspace will do with
that information at this point in time (i.e. 10+ years without that
information...)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sctp: set chunk->tsn_gap_acked at the end of cycle

2013-11-22 Thread Vlad Yasevich
On 11/22/2013 02:24 PM, Chang wrote:
> 
> On 11/22/2013 03:27 PM, Vlad Yasevich wrote:
>> On 11/22/2013 02:49 AM, Chang Xiangzhong wrote:
>>> tsn_gap_acked is an important state flag in chunk, which indicates if
>>> the
>>> chunk has been acked in gap reports before.
>> Actually, this bit indicates simply that the chunk has been acked.  It
>> doesn't state whether it's been acked in a gap report or via
>> cumulative tsn.
> Thanks for pointing this out. Sorry for not having made that clear.
>>
>>> SFR-CACC algorithm depends on this
>>> variable. So set this at the end of each iteration, otherwise the
>>> SFR-CACC
>>> algorithm would never be toggled.
>>>
>>> Signed-off-by: Chang Xiangzhong 
>>> ---
>>>   net/sctp/outqueue.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
>>> index 1b494fa..bff828c 100644
>>> --- a/net/sctp/outqueue.c
>>> +++ b/net/sctp/outqueue.c
>>> @@ -1396,7 +1396,6 @@ static void sctp_check_transmitted(struct
>>> sctp_outq *q,
>>>* while DATA was outstanding).
>>>*/
>>>   if (!tchunk->tsn_gap_acked) {
>>> -tchunk->tsn_gap_acked = 1;
>>>   if (TSN_lt(*highest_new_tsn_in_sack, tsn))
>>>   *highest_new_tsn_in_sack = tsn;
>>>   bytes_acked += sctp_data_size(tchunk);
>>> @@ -1460,6 +1459,8 @@ static void sctp_check_transmitted(struct
>>> sctp_outq *q,
>>>*/
>>>   list_add_tail(lchunk, );
>>>   }
>>> +
>>> +tchunk->tsn_gap_acked = 1;
>> The current code will set the state if it hasn't been set yet.  Why is
>> this needed?
> Because in line 1420 ~ 1440 The SFR-CACC algorithms use tsn_gap_acked.
> That "if block" would never be triggered because the varaiable's been
> set. That's why I move the "state changing sentence" to the end of the
> iteration.
>> Now there is an issue with tracking highest_new_tsn_in_sack.  The spec
>> is a little vague on this, but the highest_new_tsn_in_sack only supposed
>> to track tsns that have not been resent.  If a tsn has been reneged, and
>> then sent again, it is not considered 'new' thus should not count
>> toward highest_new_tsn_in_sack.  We currently do not track this right.
> I'll try to figure this out.

So the solution that's been proposed before is to move the CACC block up
and merge it with the !tsn_gap_acked block.  This requires additional
change though to only mark highest_new_tsn if the chunk has not been
retransmitted (which you've added in a prior patch).

-vlad

>> -vlad
>>
>> -vlad
>>
>>>   } else {
>>>   if (tchunk->tsn_gap_acked) {
>>>   pr_debug("%s: receiver reneged on data TSN:0x%x\n",
>>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] checkpatch.pl: Check for functions without a real prototype

2013-11-22 Thread Elena Ufimtseva
On Fri, Nov 22, 2013 at 5:45 PM, Joe Perches  wrote:
> On Fri, 2013-11-22 at 17:17 -0500, Elena Ufimtseva wrote:
>> Based on Richard Weinberger patch https://lkml.org/lkml/2012/3/16/510
>> Functions like this one are evil:
>>
>> void foo()
>> {
>>   ...
>> }
> []
>> - added white space in regular expression between (), based
>> on comments from Joe Perches.
>
> Thanks Elena, but this is already applied in -next.
>
>
No problem, thanks Joe )

-- 
Elena
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] fs: sysfs: Add devres support

2013-11-22 Thread Dmitry Torokhov
On Sat, Mar 16, 2013 at 09:21:40AM -0700, Greg Kroah-Hartman wrote:
> On Thu, Mar 14, 2013 at 08:24:45PM -0700, Guenter Roeck wrote:
> > Provide devres functions for device_create_file, sysfs_create_file,
> > and sysfs_create_group plus the respective remove functions.
> > 
> > Idea is to be able to drop calls to the remove functions from the various
> > drivers using those calls.
> 
> Hm, despite the fact that almost every driver that makes these calls is
> broken?  :)
> 
> > Potential savings are substantial. There are more than 700 calls to
> > device_remove_file in the kernel, more than 500 calls to sysfs_remove_group,
> > and some 50 calls to sysfs_remove_file (though not all of those use 
> > dev->kobj
> > as parameter). Expanding the API to sysfs_create_bin_file would add another 
> > 80+
> > opportunities, and adding sysfs_create_link would create another 100 or so.
> 
> The idea is nice, but why are these drivers adding sysfs files on their
> own?  Are they doing this in a way that is race-free with userspace
> (i.e. creating them before userspace is told about the device), or are
> they broken and need to have these calls added to the "default
> device/driver/bus" attribute list for them instead?

Just stumbled upon this thread...

There is a need for drivers to add driver-specific attributes to a
device. Since they are driver specific they can not go into bus or class
or whatever default attributes that are created when device is
instantiated, but rather attached to the device when a driver binds to
them. An example would be a PS/2 mouse driver allowing user to control
report rate and resolution of the device. Since it only applicable to
PS/2 mice the knob does not belong to the generic serio layer/bus nor
should it go into input layer as it is again PS/2 specific. So psmouse
creates it while binding to a serio port.

Do we send a uevent when driver binds/unbinds from a device? If not I
think we should so that userspace can check for additional attributes,
if any.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] checkpatch.pl: Check for functions without a real prototype

2013-11-22 Thread Joe Perches
On Fri, 2013-11-22 at 17:17 -0500, Elena Ufimtseva wrote:
> Based on Richard Weinberger patch https://lkml.org/lkml/2012/3/16/510
> Functions like this one are evil:
> 
> void foo()
> {
>   ...
> }
[]
> - added white space in regular expression between (), based
> on comments from Joe Perches.

Thanks Elena, but this is already applied in -next.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation

2013-11-22 Thread Scott Wood
On Mon, 2013-11-04 at 16:52 +, Joseph S. Myers wrote:
> From: Joseph Myers 
> 
> The e500 SPE floating-point emulation code clears existing exceptions
> (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the
> emulated operation.  However, these exception bits are the "sticky",
> cumulative exception bits, and should only be cleared by the user
> program setting SPEFSCR, not implicitly by any floating-point
> instruction (whether executed purely by the hardware or emulated).
> The spurious clearing of these bits shows up as missing exceptions in
> glibc testing.
> 
> Fixing this, however, is not as simple as just not clearing the bits,
> because while the bits may be from previous floating-point operations
> (in which case they should not be cleared), the processor can also set
> the sticky bits itself before the interrupt for an exception occurs,
> and this can happen in cases when IEEE 754 semantics are that the
> sticky bit should not be set.  Specifically, the "invalid" sticky bit
> is set in various cases with non-finite operands, where IEEE 754
> semantics do not involve raising such an exception, and the
> "underflow" sticky bit is set in cases of exact underflow, whereas
> IEEE 754 semantics are that this flag is set only for inexact
> underflow.  Thus, for correct emulation the kernel needs to know the
> setting of these two sticky bits before the instruction being
> emulated.
> 
> When a floating-point operation raises an exception, the kernel can
> note the state of the sticky bits immediately afterwards.  Some
>  functions that affect the state of these bits, such as
> fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and
> PR_SET_FPEXC anyway, and so it is natural to record the state of those
> bits during that call into the kernel and so avoid any need for a
> separate call into the kernel to inform it of a change to those bits.
> Thus, the interface I chose to use (in this patch and the glibc port)
> is that one of those prctl calls must be made after any userspace
> change to those sticky bits, other than through a floating-point
> operation that traps into the kernel anyway.

This sounds like an incompatible change to userspace API.  What about
older glibc?  What about user code that directly manipulates these bits
rather than going through libc, or uses a libc other than glibc?  Where
is this API requirement documented?

I think the impact of this could be reduced by using this mechanism only
to clear bits, rather than set them.  That is, if the exception bit is
unset, don't set it just because it's set in spefscr_last -- but if it's
not set in spefscr_last, and the emulation code doesn't want to set it,
then clear it.

Are there any cases where the exception bit can be set without the
kernel taking a trap, or is userspace manipulation limited to clearing
the bits?

-Scott



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/9 v3] kexec kernel efi runtime support

2013-11-22 Thread Toshi Kani
On Thu, 2013-11-21 at 14:17 +0800, dyo...@redhat.com wrote:
> Hi,
> 
> Here is the V3 for supporting kexec kernel efi runtime.
> Per pervious discussion I pass the 1st kernel efi runtime mapping
> via setup_data to 2nd kernel. Besides of the runtime mapping
> info I also pass the fw_vendor, runtime, config table, smbios
> physical address in setup_data. EFI spec mentioned fw_vendor,
> runtime, config table addresses will be converted to virt address
> after entering virtual mode, but we will use it as physical address
> in efi_init. For smbios EFI spec did not mention about the address
> updating, but during my test on a HP workstation, the bios will
> convert it to Virt addr, thus pass it in setup_data as well.
> 
> For fw_vendor, runtime, config table, I export them in /sys/firmware/
> efi/, smbios is already in /sys/firmware/efi/systab.
> 
> For efi runtime mapping I add a new directory /sys/firmware/efi/
> runtime-map/ like below
> [dave@darkstar ~]$ tree /sys/firmware/efi/runtime-map/
> /sys/firmware/efi/runtime-map/
> ├── 0
> │   ├── attribute
> │   ├── num_pages
> │   ├── phys_addr
> │   ├── type
> │   └── virt_addr
> ├── 1
> │   ├── attribute
> │   ├── num_pages
> │   ├── phys_addr
> │   ├── type
> │   └── virt_addr
> [snip]
>  
> kexec-tools will assemble them as setup_data and pass to 2nd kernel.
> I will send userspace patches as well.
> 
> Limitation is I only write support for x86_64, test on below machines:
> Lenovo thinkpad t420
> Dell inspiron 14 - 3421
> HP Z420 workstation
> Qemu + OVMF

Tested on an HP EFI-based 60-way server (prototype).  For the series:

Tested-by: Toshi Kani 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/34] idle: Implement generic idle function

2013-11-22 Thread Thomas Gleixner
On Mon, 18 Nov 2013, Viresh Kumar wrote:

> On Fri, Mar 22, 2013 at 3:23 AM, Thomas Gleixner  wrote:
> > --- linux-2.6.orig/kernel/cpu/idle.c
> > +++ linux-2.6/kernel/cpu/idle.c
> 
> > +void cpu_startup_entry(enum cpuhp_state state)
> > +{
> > +   current_set_polling();
> > +   arch_cpu_idle_prepare();
> > +   cpu_idle_loop();
> > +}
> > +#endif
> 
> Hi Thomas..
> 
> Manish (cc'd) recently asked me purpose of _state_ argument in this
> routine, As this is unused here..
> 
> Was this reserved for future use?

Yep, if you dig the mail archives you'll find my cpu hotplug series
which tries to convert it to a full state machine. So the state
argument is due to that, because the boot cpu will call into that with
a certain state and we want to allow archs to skip some states even on
secondary cpus. So yes, it's unused, but its there on purpose. It's
going to be used when my work backlog drops down to the cpuhotplug
mess again. About 2 lightyears from now 

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysfs: handle duplicate removal attempts in sysfs_remove_group()

2013-11-22 Thread Rafael J. Wysocki
On Friday, November 22, 2013 08:43:55 AM Bjorn Helgaas wrote:
> [+cc Rafael, James]
> 
> On Tue, Nov 19, 2013 at 11:18 PM, Tejun Heo  wrote:
> > (cc'ing Bjorn)
> >
> > Hello,
> >
> > On Tue, Nov 19, 2013 at 03:09:58PM +0200, Mika Westerberg wrote:
> >> Commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) changed
> >> the behavior so that directory removals will be done recursively. This
> >> means that the sysfs group might already be removed if its parent directory
> >> has been removed.
> >>
> >> The current code outputs warnings similar to following log snippet when it
> >> detects that there is no group for the given kobject:
> >>
> >>  WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 
> >> sysfs_remove_group+0xc6/0xd0()
> >>  sysfs group 81c6f1e0 not found for kobject 'host7'
> >>  Modules linked in:
> >>  CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
> >>  Hardware name:  /D33217CK, BIOS 
> >> GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> >>  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> >>   0009 8801002459b0 817daab1 8801002459f8
> >>   8801002459e8 810436b8  81c6f1e0
> >>   88006d440358 88006d440188 88006e8b4c28 880100245a48
> >>  Call Trace:
> >>   [] dump_stack+0x45/0x56
> >>   [] warn_slowpath_common+0x78/0xa0
> >>   [] warn_slowpath_fmt+0x47/0x50
> >>   [] sysfs_remove_group+0xc6/0xd0
> >>   [] dpm_sysfs_remove+0x3e/0x50
> >>   [] device_del+0x40/0x1b0
> >>   [] device_unregister+0xd/0x20
> >>   [] scsi_remove_host+0xba/0x110
> >>   [] ata_host_detach+0xc6/0x100
> >>   [] ata_pci_remove_one+0x18/0x20
> >>   [] pci_device_remove+0x28/0x60
> >>   [] __device_release_driver+0x64/0xd0
> >>   [] device_release_driver+0x1e/0x30
> >>   [] bus_remove_device+0xf7/0x140
> >>   [] device_del+0x121/0x1b0
> >>   [] pci_stop_bus_device+0x94/0xa0
> >>   [] pci_stop_bus_device+0x3b/0xa0
> >>   [] pci_stop_bus_device+0x3b/0xa0
> >>   [] pci_stop_and_remove_bus_device+0xd/0x20
> >>   [] trim_stale_devices+0x73/0xe0
> >>   [] trim_stale_devices+0xbb/0xe0
> >>   [] trim_stale_devices+0xbb/0xe0
> >>   [] acpiphp_check_bridge+0x7e/0xd0
> >>   [] hotplug_event+0xcd/0x160
> >>   [] hotplug_event_work+0x25/0x60
> >>   [] acpi_hotplug_work_fn+0x17/0x22
> >>   [] process_one_work+0x17a/0x430
> >>   [] worker_thread+0x119/0x390
> >>   [] kthread+0xcd/0xf0
> >>   [] ret_from_fork+0x7c/0xb0
> >
> > So, we do have cases where the parent is removed before the child.  I
> > suppose the parent pci bridge is removed already?  AFAICS this
> > shouldn't break anything but people did seem to expect the removals to
> > be ordered from child to parent.  Bjorn, is this something you expect
> > to happened?
> 
> I do not expect a PCI bridge to be removed before the devices below
> it.  We should be removing all the children before removing the parent
> bridge.

Precisely.

> But is this related to PCI?  I don't see the connection yet.  I tried
> to look into this a bit (my notes are at
> https://bugzilla.kernel.org/show_bug.cgi?id=65281), but I haven't
> figured out the big-picture problem yet.
> 
> I don't have warm fuzzies that adding a "have we already removed this"
> check is the best resolution, but maybe that's just because I don't
> understand the problem.

I don't think this is related to removing the parent before the child.
It complains about the 'power' directory of all devices being removed
it seems.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmzone.h: constify some zone access functions

2013-11-22 Thread Andrew Morton
On Fri, 22 Nov 2013 12:01:06 -0500 Luiz Capitulino  
wrote:

> Signed-off-by: Luiz capitulino 
> ---
>  include/linux/mmzone.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index bd791e4..5e202d6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -560,12 +560,12 @@ static inline bool zone_spans_pfn(const struct zone 
> *zone, unsigned long pfn)
>   return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>  }
>  
> -static inline bool zone_is_initialized(struct zone *zone)
> +static inline bool zone_is_initialized(const struct zone *zone)
>  {
>   return !!zone->wait_table;
>  }
>  
> -static inline bool zone_is_empty(struct zone *zone)
> +static inline bool zone_is_empty(const struct zone *zone)
>  {
>   return zone->spanned_pages == 0;
>  }
> @@ -843,7 +843,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned 
> long, unsigned long);
>   */
>  #define zone_idx(zone)   ((zone) - 
> (zone)->zone_pgdat->node_zones)
>  
> -static inline int populated_zone(struct zone *zone)
> +static inline int populated_zone(const struct zone *zone)
>  {
>   return (!!zone->present_pages);
>  }

hm, why?  I counted ten similarly constifyable functions in mm.h and
stopped only 1/4 of the way through. What's so special about these three?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)

2013-11-22 Thread Thomas Gleixner
On Fri, 22 Nov 2013, Rafael J. Wysocki wrote:
> On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote:
> > Ok, I've finally managed to find out the bad commit:
> > ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock
> > over system PM transitions
> > 
> > I verified that the parent commit doesn't have the problem.
> 
> Interesting.
> 
> > Rafael, you're the man now ;)
> 
> I kind of don't see how that commit may result in behavior that you
> described earlier in the thread.
> 
> You get a memory corruption that seems to have started to happen because
> we're holding an additional lock over suspend resume now.  Something's fishy
> on that machine and we need to figure out what it is.

The hickup happens in the timer softirq.

@Francis: Did you try to enable DEBUG_OBJECTS.*. If not please give it
  a try.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >