Re: [PATCH 6/9] uprobes: flush cache after xol write

2012-10-25 Thread Ananth N Mavinakayanahalli
On Thu, Oct 25, 2012 at 04:58:39PM +0200, Oleg Nesterov wrote:
> On 10/16, Rabin Vincent wrote:
> >
> > 2012/10/15 Oleg Nesterov :
> > > On 10/14, Rabin Vincent wrote:
> > >> Flush the cache so that the instructions written to the XOL area are
> > >> visible.
> > >>
> > >> Signed-off-by: Rabin Vincent 
> > >> ---
> > >>  kernel/events/uprobes.c |1 +
> > >>  1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > >> index ca000a9..8c52f93 100644
> > >> --- a/kernel/events/uprobes.c
> > >> +++ b/kernel/events/uprobes.c
> > >> @@ -1246,6 +1246,7 @@ static unsigned long xol_get_insn_slot(struct 
> > >> uprobe *uprobe, unsigned long slot
> > >>   offset = current->utask->xol_vaddr & ~PAGE_MASK;
> > >>   vaddr = kmap_atomic(area->page);
> > >>   arch_uprobe_xol_copy(>arch, vaddr + offset);
> > >> + flush_dcache_page(area->page);
> > >>   kunmap_atomic(vaddr);
> > >
> > > I agree... but why under kmap_atomic?
> >
> > No real reason; I'll move it to after the unmap.
> 
> OK. I assume you will send v2.
> 
> But this patch looks like a bugfix, flush_dcache_page() is not a nop
> on powerpc. So perhaps we should apply this fix right now?

Starting Power5, all Power processers have coherent caches.

> OTOH, I do not understand this stuff, everything is nop on x86. And
> when I look into Documentation/cachetlb.txt I am starting to think
> that may be this needs flush_icache_user_range instead?
>
> Rabin, Ananth could you clarify this?

Yes. We need flush_icache_user_range(). Though for x86 its always been a
nop, one never knows if there is some Power4 or older machine out there
that is still being used. We are fine for Power5 and later.

Ananth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] perf: Make perf build for x86 with UAPI disintegration applied

2012-10-25 Thread Namhyung Kim
On Fri, 19 Oct 2012 17:56:24 +0100, David Howells wrote:
> Make perf build for x86 once the UAPI disintegration patches for that arch
> have been applied by adding the appropriate -I flags - in the right order -
> and then converting some #includes that use ../.. notation to find main kernel
> headerfiles to use  and  instead.

Looks nice.

>
> Note that -Iarch/foo/include/uapi is present _before_ -Iarch/foo/include.
> This makes sure we get the userspace version of the pt_regs struct.  Ideally,
> we wouldn't have the latter -I flag at all, but unfortunately we want
> asm/svm.h and asm/vmx.h in buildin-kvm.c and these aren't part of the UAPI -
> at least not for x86.  I wonder if the bits outside of the __KERNEL__ guards
> *should* be transferred there.

What about asm/kvm.h?  Is it a part of the UAPI?

>
> I note also that perf seems to do its dependency handling manually by listing
> all the header files it might want to use in LIB_H in the Makefile.  Can this
> be changed to use -MD?

Yeah, that part could be improved, probably with -MMD.

>
> Signed-off-by: David Howells 
> ---
>
>  tools/perf/Makefile  |   16 +++-
>  tools/perf/builtin-kvm.c |6 +++---
>  tools/perf/perf.h|   16 +++-
>  3 files changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index f7c968a..9024a42 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -169,7 +169,21 @@ endif
>  
>  ### --- END CONFIGURATION SECTION ---
>  
> -BASIC_CFLAGS = -Iutil/include -Iarch/$(ARCH)/include -I$(OUTPUT)util 
> -I$(TRACE_EVENT_DIR) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_GNU_SOURCE
> +ifeq ($(srctree),)
> +srctree := $(shell pwd)
> +endif

Isn't the srctree intended to point to kernel root?  Also you missed to
define the objtree which used below.

> +
> +BASIC_CFLAGS = \
> + -Iutil/include \
> + -Iarch/$(ARCH)/include \
> + -I$(objtree)/arch/$(ARCH)/include/generated/uapi \
> + -I$(srctree)/arch/$(ARCH)/include/uapi \
> + -I$(srctree)/arch/$(ARCH)/include \
> + -I$(objtree)/include/generated/uapi \
> + -I$(srctree)/include/uapi \
> + -I$(OUTPUT)util \
> + -I$(TRACE_EVENT_DIR) \
> + -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

This isn't bad, but using '+=' looks more natural IMHO.

BASIC_CFLAGS  = -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
BASIC_CFLAGS += -Iutil/include
BASIC_CFLAGS += -Iarch/$(ARCH)/include
...

>  BASIC_LDFLAGS =
>  
>  # Guard against environment variables
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 260abc5..e013bdb 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -22,9 +22,9 @@
>  #include 
>  #include 
>  
> -#include "../../arch/x86/include/asm/svm.h"
> -#include "../../arch/x86/include/asm/vmx.h"
> -#include "../../arch/x86/include/asm/kvm.h"
> +#include 
> +#include 
> +#include 
>  
>  struct event_key {
>   #define INVALID_KEY (~0ULL)
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index 2762877..238f923 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -5,8 +5,9 @@ struct winsize;
>  
>  void get_term_dimensions(struct winsize *ws);
>  
> +#include 
> +
>  #if defined(__i386__)
> -#include "../../arch/x86/include/asm/unistd.h"
>  #define rmb()asm volatile("lock; addl $0,0(%%esp)" ::: 
> "memory")
>  #define cpu_relax()  asm volatile("rep; nop" ::: "memory");
>  #define CPUINFO_PROC "model name"
> @@ -16,7 +17,6 @@ void get_term_dimensions(struct winsize *ws);
>  #endif
>  
>  #if defined(__x86_64__)
> -#include "../../arch/x86/include/asm/unistd.h"
>  #define rmb()asm volatile("lfence" ::: "memory")
>  #define cpu_relax()  asm volatile("rep; nop" ::: "memory");
>  #define CPUINFO_PROC "model name"
> @@ -26,20 +26,17 @@ void get_term_dimensions(struct winsize *ws);
>  #endif
>  
>  #ifdef __powerpc__
> -#include "../../arch/powerpc/include/asm/unistd.h"
>  #define rmb()asm volatile ("sync" ::: "memory")
>  #define cpu_relax()  asm volatile ("" ::: "memory");
>  #define CPUINFO_PROC "cpu"
>  #endif
>  
>  #ifdef __s390__
> -#include "../../arch/s390/include/asm/unistd.h"
>  #define rmb()asm volatile("bcr 15,0" ::: "memory")
>  #define cpu_relax()  asm volatile("" ::: "memory");
>  #endif
>  
>  #ifdef __sh__
> -#include "../../arch/sh/include/asm/unistd.h"
>  #if defined(__SH4A__) || defined(__SH5__)
>  # define rmb()   asm volatile("synco" ::: "memory")
>  #else
> @@ -50,35 +47,30 @@ void get_term_dimensions(struct winsize *ws);
>  #endif
>  
>  #ifdef __hppa__
> -#include "../../arch/parisc/include/asm/unistd.h"
>  #define rmb()asm volatile("" ::: "memory")
>  #define cpu_relax()  asm volatile("" ::: "memory");
>  #define CPUINFO_PROC "cpu"
>  #endif
>  
>  #ifdef __sparc__
> -#include "../../arch/sparc/include/asm/unistd.h"

It might conflict with davem's 

Re: linux-next: build warnings after merge of the akpm tree

2012-10-25 Thread Stefani Seibold
Am Freitag, den 26.10.2012, 06:36 +0800 schrieb Richard Yang:

> >
> >And holy cow that code is hard to read :( Why was kfifo_in()
> >implemented as a macro, anyway?  AFAICT all its args have a known type,
> >so we could have used a proper C interface, which would have fixed all
> >this nicely.
> 

Thats simple for performance reasons, the compiler remove most of the
code during the compile stage, so no runtime checks are necessary. And
it is the only way since C does not provides templates like C++.
 
> Hmm, move the definition of kfifo_in()/kfifo_out() into the kfifo.c?
> 

Don't do it. this will result in a performance degradation. Look at the
disassembled code by each change in code and compare it with the
previous one. I don't believe that you can produce better code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

2012-10-25 Thread Tang Chen

Hi Toshi,

On 10/26/2012 01:20 AM, Toshi Kani wrote:
...

Why do you need to call acpi_bus_trim(device,0) to stop the container
device first?


This issue was introduced by Lu Yinghai, I think he could give a better
answer than me. :)
Please refer to the following url:

http://www.spinics.net/lists/linux-pci/msg17667.html

However, this is not applied into the pci tree yet.

We have worked out a patch set to clean up the logic for PCI/ACPI binding
relationship. It updates PCI/ACPI binding relationship by registering bus
notification onto pci_bus_type instead of hooking into the ACPI/glue.c.


Thanks for the info and pointer.  Tang, I'd suggest you add such info to
the comment so that others know that this step is needed for removing
PCI bridges.  It helps us to know where to look at...


OK, I'll add it in the next version. :)




To accommodate that patch set, the ACPI device destroy process has been
split into two steps:
1) acpi_bus_trim(device,0) to unbind ACPI drivers


Does this step also detach PCI drivers from PCI cards as well?


Yes, it calls device_release_driver() to release the device driver.

device_release_driver()
  |->__device_release_driver()
   |->dev->driver = NULL;

Thanks. :)



Thanks,
-Toshi




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kfifo: remove unnecessary type check

2012-10-25 Thread Stefani Seibold
Am Freitag, den 26.10.2012, 09:46 +0800 schrieb Yuanhan Liu:
> From: Yuanhan Liu 
> 
> Firstly, this kind of type check doesn't work. It does something similay
> like following:
>   void * __dummy = NULL;
>   __buf = __dummy;
> 
> __dummy is defined as void *. Thus it will not trigger warnings as
> expected.
> 
> Second, we don't need that kind of check. Since the prototype
> of __kfifo_out is:
>   unsigned int __kfifo_out(struct __kfifo *fifo,  void *buf, unsigned int 
> len)
> 
> buf is defined as void *, so we don't need do the type check. Remove it.
> 
> LINK: https://lkml.org/lkml/2012/10/25/386
> LINK: https://lkml.org/lkml/2012/10/25/584
> 
> Cc: Andrew Morton 
> Cc: Wei Yang 
> Cc: Stefani Seibold 
> Cc: Fengguang Wu 
> Cc: Stephen Rothwell 
> Signed-off-by: Yuanhan Liu 
> ---
>  include/linux/kfifo.h | 20 
>  1 file changed, 20 deletions(-)
> 
> diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h
> index 10308c6..b8c1d03 100644
> --- a/include/linux/kfifo.h
> +++ b/include/linux/kfifo.h
> @@ -390,10 +390,6 @@ __kfifo_int_must_check_helper( \
>   unsigned int __ret; \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) { \
> - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \
> - __dummy = (typeof(__val))NULL; \
> - } \
>   if (__recsize) \
>   __ret = __kfifo_in_r(__kfifo, __val, sizeof(*__val), \
>   __recsize); \
> @@ -432,8 +428,6 @@ __kfifo_uint_must_check_helper( \
>   unsigned int __ret; \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) \
> - __val = (typeof(__tmp->ptr))0; \
>   if (__recsize) \
>   __ret = __kfifo_out_r(__kfifo, __val, sizeof(*__val), \
>   __recsize); \
> @@ -473,8 +467,6 @@ __kfifo_uint_must_check_helper( \
>   unsigned int __ret; \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) \
> - __val = (typeof(__tmp->ptr))NULL; \
>   if (__recsize) \
>   __ret = __kfifo_out_peek_r(__kfifo, __val, sizeof(*__val), \
>   __recsize); \
> @@ -512,10 +504,6 @@ __kfifo_uint_must_check_helper( \
>   unsigned long __n = (n); \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) { \
> - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \
> - __dummy = (typeof(__buf))NULL; \
> - } \
>   (__recsize) ?\
>   __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \
>   __kfifo_in(__kfifo, __buf, __n); \
> @@ -565,10 +553,6 @@ __kfifo_uint_must_check_helper( \
>   unsigned long __n = (n); \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) { \
> - typeof(__tmp->ptr) __dummy = NULL; \
> - __buf = __dummy; \
> - } \
>   (__recsize) ?\
>   __kfifo_out_r(__kfifo, __buf, __n, __recsize) : \
>   __kfifo_out(__kfifo, __buf, __n); \
> @@ -777,10 +761,6 @@ __kfifo_uint_must_check_helper( \
>   unsigned long __n = (n); \
>   const size_t __recsize = sizeof(*__tmp->rectype); \
>   struct __kfifo *__kfifo = &__tmp->kfifo; \
> - if (0) { \
> - typeof(__tmp->ptr) __dummy __attribute__ ((unused)) = NULL; \
> - __buf = __dummy; \
> - } \
>   (__recsize) ? \
>   __kfifo_out_peek_r(__kfifo, __buf, __n, __recsize) : \
>   __kfifo_out_peek(__kfifo, __buf, __n); \

Did you tried to compile the whole kernel including all the drivers with
your patch?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] tools: Honour the O= flag when tool build called from a higher Makefile

2012-10-25 Thread Namhyung Kim
On Fri, 19 Oct 2012 17:56:16 +0100, David Howells wrote:
> Honour the O= flag that was passed to a higher level Makefile and then passed
> down as part of a tool build.
>
> To make this work, the top-level Makefile passes the original O= flag and
> subdir=tools to the tools/Makefile, and that in turn passes
> subdir=$(O)/$(subdir)/foodir when building tool foo in directory
> $(O)/$(subdir)/foodir (where the intervening slashes aren't added if an
> element is missing).
>
> For example, take perf.  This is found in tools/perf/.  Assume we're building
> into directory ~/zebra/, so we pass O=~/zebra to make.  Dependening on where
> we run the build from, we see:
>
>   make run in dir $(OUTPUT) dir
>   === ==
>   linux   ~/zebra/tools/perf/
>   linux/tools ~/zebra/perf/
>   linux/tools/perf~/zebra/
>
> and if O= is not set, we get:
>
>   make run in dir $(OUTPUT) dir
>   === ==
>   linux   linux/tools/perf/
>   linux/tools linux/tools/perf/
>   linux/tools/perflinux/tools/perf/
>
> The output directories are created by the descend function if they don't
> already exist.

This is my test:

  namhyung@sejong:~$ cd project/linux
  namhyung@sejong:linux$ make O=~/build/zebra tools/perf
  /bin/sh: line 0: cd: /home/namhyung/build/zebra: No such file or directory
  Makefile:121: *** output directory "/home/namhyung/build/zebra" does not 
exist.  Stop.
  
  namhyung@sejong:tools$ mkdir ~/build/zebra
  namhyung@sejong:linux$ make O=~/build/zebra tools/perf
HOSTCC  scripts/basic/fixdep
GEN /home/namhyung/build/zebra/Makefile
HOSTCC  scripts/kconfig/conf.o
HOSTCC  scripts/kconfig/zconf.tab.o
HOSTLD  scripts/kconfig/conf
  scripts/kconfig/conf --silentoldconfig Kconfig
  ***
  *** Configuration file ".config" not found!
  ***
  *** Please run some configurator (e.g. "make oldconfig" or
  *** "make menuconfig" or "make xconfig").
  ***
  make[3]: *** [silentoldconfig] Error 1
  make[2]: *** [silentoldconfig] Error 2
  DESCEND perf
  MKDIR /home/namhyung/build/zebra/tools/perf/arch/
  MKDIR /home/namhyung/build/zebra/tools/perf/arch/x86/util/
  MKDIR /home/namhyung/build/zebra/tools/perf/bench/
  MKDIR /home/namhyung/build/zebra/tools/perf/scripts/perl/Perf-Trace-Util/
  MKDIR 
/home/namhyung/build/zebra/tools/perf/scripts/python/Perf-Trace-Util/
  MKDIR /home/namhyung/build/zebra/tools/perf/ui/
  MKDIR /home/namhyung/build/zebra/tools/perf/ui/browsers/
  MKDIR /home/namhyung/build/zebra/tools/perf/ui/gtk/
  MKDIR /home/namhyung/build/zebra/tools/perf/ui/stdio/
  MKDIR /home/namhyung/build/zebra/tools/perf/ui/tui/
  MKDIR /home/namhyung/build/zebra/tools/perf/util/
  MKDIR /home/namhyung/build/zebra/tools/perf/util/scripting-engines/
  PERF_VERSION = 3.7.rc2.1655.g54fa2b.dirty
  GEN /home/namhyung/build/zebra/tools/perf/common-cmds.h
  * new build flags or prefix
  CC /home/namhyung/build/zebra/tools/perf/perf.o
  ...

This looks ok but it'd be better if we can skip the config check when
building tools IMHO.


  namhyung@sejong:linux cd tools
  
  namhyung@sejong:tools$ make O=~/build/zebra perf
  DESCEND perf
  ...
  * new build flags or prefix
  CC /home/namhyung/build/zebra/perf.o
  ...

This looks not good as it doesn't build perf into
~/build/zebra/perf/perf.o.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 027/193] arch/sh: remove CONFIG_EXPERIMENTAL

2012-10-25 Thread Kees Cook
On Thu, Oct 25, 2012 at 9:28 PM, Paul Mundt  wrote:
> On Tue, Oct 23, 2012 at 01:01:40PM -0700, Kees Cook wrote:
>> This config item has not carried much meaning for a while now and is
>> almost always enabled by default. As agreed during the Linux kernel
>> summit, remove it.
>>
>> CC: Paul Mundt 
>> CC: Tejun Heo 
>> Signed-off-by: Kees Cook 
>
> While there are cases where it is largely superfluous, we also have
> plenty of cases in here that are genuinely experimental features and
> generally shouldn't be enabled unless someone is prepared for some
> hacking. We can of course replace this with an arch-specific option if
> needed, but I disagree with suddenly making experimental features
> suddenly appear to be anything other than what they are.

Yeah, things that really are experimental need something, but it
hasn't been meaningful to put them behind CONFIG_EXPERIMENTAL. Here's
the text from the first patch, which details possible approaches:

https://lkml.org/lkml/2012/10/23/878
This config item has not carried much meaning for a while now and is
almost always enabled by default (especially in distro builds). As agreed
during the Linux kernel summit, it should be removed. As a first step,
remove it from being listed, and default it to on. Once it has been
removed from all subsystem Kconfigs, it will be dropped entirely.

For items that really are experimental, maintainers should use "default
n", optionally include "(EXPERIMENTAL)" in the title, and add language to
the help text indicating why the item should be considered experimental.

For items that are dangerously experimental, the maintainer is encouraged
to follow the above title recommendation, add stronger language to the
help text, and optionally use (depending on the extent of the danger,
from least to most dangerous): printk(), add_taint(TAINT_WARN),
add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN.


-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PCI/PM: Add comments for PME poll support for PCIe

2012-10-25 Thread Huang Ying
There are comments on why PME poll support is necessary for PCI
devices, but not for PCIe devices.  That may lead to misunderstanding
that PME poll is only necessary for PCI devices.  So add comments
related to PCIe PME poll to make it more clear.

The content of comments comes from the changelog of commit:

379021d5c0899fcf9410cae4ca7a59a5a94ca769

Cc: Rafael J. Wysocki 
Signed-off-by: Huang Ying 
---
 drivers/pci/pci.c |   28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1578,15 +1578,25 @@ void pci_pme_active(struct pci_dev *dev,
 
pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
 
-   /* PCI (as opposed to PCIe) PME requires that the device have
-  its PME# line hooked up correctly. Not all hardware vendors
-  do this, so the PME never gets delivered and the device
-  remains asleep. The easiest way around this is to
-  periodically walk the list of suspended devices and check
-  whether any have their PME flag set. The assumption is that
-  we'll wake up often enough anyway that this won't be a huge
-  hit, and the power savings from the devices will still be a
-  win. */
+   /*
+* PCI (as opposed to PCIe) PME requires that the device have
+* its PME# line hooked up correctly. Not all hardware vendors
+* do this, so the PME never gets delivered and the device
+* remains asleep. The easiest way around this is to
+* periodically walk the list of suspended devices and check
+* whether any have their PME flag set. The assumption is that
+* we'll wake up often enough anyway that this won't be a huge
+* hit, and the power savings from the devices will still be a
+* win.
+*
+* Although PCIe uses in-band PME message instead of PME# line
+* to report PME, PME does not work for some PCIe devices in
+* reality.  For example, there are devices that set their PME
+* status bits, but don't really bother to send a PME message;
+* there are PCI Express Root Ports that don't bother to
+* trigger interrupts when they receive PME messages from the
+* devices below.  So PME poll is used for PCIe devices too.
+*/
 
if (dev->pme_poll) {
struct pci_pme_device *pme_dev;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread YingHang Zhu
On Fri, Oct 26, 2012 at 11:55 AM, Fengguang Wu  wrote:
> On Fri, Oct 26, 2012 at 11:38:11AM +0800, YingHang Zhu wrote:
>> On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner  wrote:
>> > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:
>> >> Hi Chen,
>> >>
>> >> > But how can bdi related ra_pages reflect different files' readahead
>> >> > window? Maybe these different files are sequential read, random read
>> >> > and so on.
>> >>
>> >> It's simple: sequential reads will get ra_pages readahead size while
>> >> random reads will not get readahead at all.
>> >>
>> >> Talking about the below chunk, it might hurt someone that explicitly
>> >> takes advantage of the behavior, however the ra_pages*2 seems more
>> >> like a hack than general solution to me: if the user will need
>> >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for
>> >> improving IO performance, then why not just increase bdi->ra_pages and
>> >> benefit all reads? One may argue that it offers some differential
>> >> behavior to specific applications, however it may also present as a
>> >> counter-optimization: if the root already tuned bdi->ra_pages to the
>> >> optimal size, the doubled readahead size will only cost more memory
>> >> and perhaps IO latency.
>> >>
>> >> --- a/mm/fadvise.c
>> >> +++ b/mm/fadvise.c
>> >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, 
>> >> loff_t len, int advice)
>> >> spin_unlock(>f_lock);
>> >> break;
>> >> case POSIX_FADV_SEQUENTIAL:
>> >> -   file->f_ra.ra_pages = bdi->ra_pages * 2;
>> >
>> > I think we really have to reset file->f_ra.ra_pages here as it is
>> > not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
>> > reduce ra_pages as a result of IO errors. Hence if you have had io
>> > errors, telling the kernel that you are now going to do  sequential
>> > IO should reset the readahead to the maximum ra_pages value
>> > supported
>> If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone
>> device's readahead can be directly tuned or turned off with blockdev
>> thus affect all files
>> using the device and without bring more complexity...
>
> It's not really feasible/convenient for the end users to hand tune
> blockdev readahead size on IO errors. Even many administrators are
> totally unaware of the readahead size parameter.
You are right, so the problem comes in this way:
If one file's read failure will affect other files? I mean for
rotating disks and discs,
a file's read failure may be due to the bad sectors which tend to be consecutive
and won't affect other files' reading status. However for tape drive
the read failure
usually indicates data corruption and other file's reading may also fail.
In other words, should we consider how many files failed to read data and
where they failed as a factor to indicate the status of the backing device,
or treat these files independently?
If we choose the previous one we can accumulate the statistics and
change bdi.ra_pages,
otherwise we may do some check for FMODE_RANDOM before we change the readahead
window.
 I may missed something, please point it out.
Thanks,
  Ying Zhu
>
> Thanks,
> Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2 v4] block/throttle: Add IO submitted information in blkio.throttle

2012-10-25 Thread Robin Dong
From: Robin Dong 

Currently, if the IO is throttled by io-throttle, the system admin has no idea
of the situation and can't report it to the real application user about that
he/she has to do something.

So this patch adds a new interface named blkio.throttle.io_submitted which
exposes the number of bios that have been sent into blk-throttle therefore the
user could calculate the difference from throttle.io_serviced to see how many
IOs are currently throttled.

Cc: Tejun Heo 
Cc: Vivek Goyal 
Cc: Jens Axboe 
Signed-off-by: Tao Ma 
Signed-off-by: Robin Dong 
---
v3 <-- v2:
 - Use nr-queued[] of struct throtl_grp for stats instaed of adding new 
blkg_rwstat.

v4 <-- v3:
 - Add two new blkg_rwstat arguments to count total bios be sent into 
blk_throttle.

 block/blk-throttle.c |   43 +++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 46ddeff..c6391b5 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -46,6 +46,10 @@ struct tg_stats_cpu {
struct blkg_rwstat  service_bytes;
/* total IOs serviced, post merge */
struct blkg_rwstat  serviced;
+   /* total bytes submitted into blk-throttle */
+   struct blkg_rwstat  submit_bytes;
+   /* total IOs submitted into blk-throttle */
+   struct blkg_rwstat  submitted;
 };
 
 struct throtl_grp {
@@ -266,6 +270,8 @@ static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
 
blkg_rwstat_reset(>service_bytes);
blkg_rwstat_reset(>serviced);
+   blkg_rwstat_reset(>submit_bytes);
+   blkg_rwstat_reset(>submitted);
}
 }
 
@@ -699,6 +705,30 @@ static void throtl_update_dispatch_stats(struct throtl_grp 
*tg, u64 bytes,
local_irq_restore(flags);
 }
 
+static void throtl_update_submit_stats(struct throtl_grp *tg, u64 bytes, int 
rw)
+{
+   struct tg_stats_cpu *stats_cpu;
+   unsigned long flags;
+
+   /* If per cpu stats are not allocated yet, don't do any accounting. */
+   if (tg->stats_cpu == NULL)
+   return;
+
+   /*
+* Disabling interrupts to provide mutual exclusion between two
+* writes on same cpu. It probably is not needed for 64bit. Not
+* optimizing that case yet.
+*/
+   local_irq_save(flags);
+
+   stats_cpu = this_cpu_ptr(tg->stats_cpu);
+
+   blkg_rwstat_add(_cpu->submitted, rw, 1);
+   blkg_rwstat_add(_cpu->submit_bytes, rw, bytes);
+
+   local_irq_restore(flags);
+}
+
 static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 {
bool rw = bio_data_dir(bio);
@@ -1084,6 +1114,16 @@ static struct cftype throtl_files[] = {
.private = offsetof(struct tg_stats_cpu, serviced),
.read_seq_string = tg_print_cpu_rwstat,
},
+   {
+   .name = "throttle.io_submit_bytes",
+   .private = offsetof(struct tg_stats_cpu, submit_bytes),
+   .read_seq_string = tg_print_cpu_rwstat,
+   },
+   {
+   .name = "throttle.io_submitted",
+   .private = offsetof(struct tg_stats_cpu, submitted),
+   .read_seq_string = tg_print_cpu_rwstat,
+   },
{ } /* terminate */
 };
 
@@ -1128,6 +1168,8 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
*bio)
if (tg_no_rule_group(tg, rw)) {
throtl_update_dispatch_stats(tg,
 bio->bi_size, bio->bi_rw);
+   throtl_update_submit_stats(tg,
+   bio->bi_size, bio->bi_rw);
goto out_unlock_rcu;
}
}
@@ -1141,6 +1183,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
*bio)
if (unlikely(!tg))
goto out_unlock;
 
+   throtl_update_submit_stats(tg, bio->bi_size, bio->bi_rw);
if (tg->nr_queued[rw]) {
/*
 * There is already another bio queued in same dir. No
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2 v4] block/throttle: remove redundant type transition

2012-10-25 Thread Robin Dong
From: Robin Dong 

We don't need to convert tg to blkg and then convert it back in
throtl_update_dispatch_stats().

Signed-off-by: Robin Dong 
---
 block/blk-throttle.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a9664fa..46ddeff 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -674,10 +674,9 @@ static bool tg_may_dispatch(struct throtl_data *td, struct 
throtl_grp *tg,
return 0;
 }
 
-static void throtl_update_dispatch_stats(struct blkcg_gq *blkg, u64 bytes,
+static void throtl_update_dispatch_stats(struct throtl_grp *tg, u64 bytes,
 int rw)
 {
-   struct throtl_grp *tg = blkg_to_tg(blkg);
struct tg_stats_cpu *stats_cpu;
unsigned long flags;
 
@@ -708,7 +707,7 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct 
bio *bio)
tg->bytes_disp[rw] += bio->bi_size;
tg->io_disp[rw]++;
 
-   throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
+   throtl_update_dispatch_stats(tg, bio->bi_size, bio->bi_rw);
 }
 
 static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg,
@@ -1127,7 +1126,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
*bio)
tg = throtl_lookup_tg(td, blkcg);
if (tg) {
if (tg_no_rule_group(tg, rw)) {
-   throtl_update_dispatch_stats(tg_to_blkg(tg),
+   throtl_update_dispatch_stats(tg,
 bio->bi_size, bio->bi_rw);
goto out_unlock_rcu;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Enable A20 using KBC for some MSI laptops to fix S3 resume

2012-10-25 Thread H. Peter Anvin
My guess is that Windows explicitly enables A20 on resume.  We should do that 
too, really... with the current heavily unified realmode code it should be easy 
- let me hack up a patch in the morning.

Robert Hancock  wrote:

>On 10/24/2012 02:09 PM, Alan Cox wrote:
>> On Wed, 24 Oct 2012 12:36:04 -0700
>> "H. Peter Anvin"  wrote:
>>
>>> Minor concern: it should do the wait for ready before sending each
>command.
>>
>> Can we get a command line to do this quirk too - it strikes me that
>if
>> the MSIs rely upon it then it may be something Windows always does so
>> will be useful to try on other problem machines as an experiment.
>
>I agree, one has to keep in mind the age-old question "how does Windows
>
>work?" since it surely has no such quirk. I'd say we're sometimes too 
>quick to add these DMI quirks when a more general solution would be 
>somehow figure out how the Linux behavior differs from what Windows is 
>doing.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

2012-10-25 Thread Justin P. Mattock

On 10/25/2012 01:47 AM, Chris Wilson wrote:

On Thu, 25 Oct 2012 10:16:08 +0200, Daniel Vetter  wrote:

On Thu, Oct 25, 2012 at 7:22 AM, Justin P. Mattock
 wrote:


here is a link to the file..: intel_error_decode
http://www.filefactory.com/file/22bypyjhs4mx


I haven't figured out how to access this thing. Can you please file a
bug report on bugs.freedesktop.org and attach it there?


Oops.. I filed with the kernel. maybe can just add a cc's
https://bugzilla.kernel.org/show_bug.cgi?id=49571



No worries, it is another ILK hang similar to the ones reported earlier
- it just seems the ring stops advancing. Hopefully it is a missing w/a
from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
-Chris



well if this means building libdrm etc.. then thats not a problem, more 
time consuming if anything. perhaps an *.rpm that I can test to see?



Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Oct 26

2012-10-25 Thread Stephen Rothwell
Hi all,

Changes since 201201025:

The arm -soc tree gained conflicts against the gpio-lw and pinctrl trees.

The akpm tree a couple of patches that turned up elsewhere.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 206 trees (counting Linus' and 27 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (2ab3f29 Merge branch 'akpm' (Andrew's fixes))
Merging fixes/master (12250d8 Merge branch 'i2c-embedded/for-next' of 
git://git.pengutronix.de/git/wsa/linux)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (b43b1ff Merge tag 'fixes-for-rmk' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes)
Merging m68k-current/for-linus (8a745ee m68k: Wire up kcmp)
Merging powerpc-merge/merge (83dac59 cpuidle/powerpc: Fix snooze state problem 
in the cpuidle design on pseries.)
Merging sparc/master (43c422e apparmor: fix apparmor OOPS in 
audit_log_untrustedstring+0x1c/0x40)
Merging net/master (910a578 vhost: fix mergeable bufs on BE hosts)
Merging sound-current/for-linus (c64064c Merge tag 'asoc-3.7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (0ff9514 PCI: Don't print anything while decoding 
is disabled)
Merging wireless/master (f89ff64 b43: Fix oops on unload when firmware not 
found)
Merging driver-core.current/driver-core-linus (bf34be0 Documentation:Chinese 
translation of Documentation/arm64/memory.txt)
Merging tty.current/tty-linus (a4f7438 Revert "serial: omap: fix software flow 
control")
Merging usb.current/usb-linus (1d63f24 Merge tag 'for-usb-linus-2012-10-25' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci into usb-linus)
Merging staging.current/staging-linus (4d3f120 staging: tidspbridge: delete 
unused mmu functions)
Merging char-misc.current/char-misc-linus (2cb55a2 sonypi: suspend/resume 
callbacks should be conditionally compiled on CONFIG_PM_SLEEP)
Merging input-current/for-linus (88fd449 Input: wacom - add INPUT_PROP_DIRECT 
flag to Cintiq 24HD)
Merging md-current/for-linus (72f36d5 md: refine reporting of resync/reshape 
delays.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)
Merging spi-current/spi/merge (d1c185b of/spi: Fix SPI module loading by using 
proper "spi:" modalias prefixes.)
Merging gpio-current/gpio/merge (96b7064 gpio/tca6424: merge I2C transactions, 
remove cast)
Merging asm-generic/master (9b04ebd asm-generic/io.h: remove asm/cacheflush.h 
include)
Merging arm/for-next (4a83d2e Merge remote-tracking branch 

Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread YingHang Zhu
On Fri, Oct 26, 2012 at 11:51 AM, Ni zhan Chen  wrote:
> On 10/26/2012 11:28 AM, YingHang Zhu wrote:
>>
>> On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen 
>> wrote:
>>>
>>> On 10/26/2012 09:27 AM, Fengguang Wu wrote:

 On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote:
>
> On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:
>>
>> Hi Chen,
>>
>>> But how can bdi related ra_pages reflect different files' readahead
>>> window? Maybe these different files are sequential read, random read
>>> and so on.
>>
>> It's simple: sequential reads will get ra_pages readahead size while
>> random reads will not get readahead at all.
>>
>> Talking about the below chunk, it might hurt someone that explicitly
>> takes advantage of the behavior, however the ra_pages*2 seems more
>> like a hack than general solution to me: if the user will need
>> POSIX_FADV_SEQUENTIAL to double the max readahead window size for
>> improving IO performance, then why not just increase bdi->ra_pages and
>> benefit all reads? One may argue that it offers some differential
>> behavior to specific applications, however it may also present as a
>> counter-optimization: if the root already tuned bdi->ra_pages to the
>> optimal size, the doubled readahead size will only cost more memory
>> and perhaps IO latency.
>>
>> --- a/mm/fadvise.c
>> +++ b/mm/fadvise.c
>> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset,
>> loff_t len, int advice)
>>   spin_unlock(>f_lock);
>>   break;
>>   case POSIX_FADV_SEQUENTIAL:
>> -   file->f_ra.ra_pages = bdi->ra_pages * 2;
>
> I think we really have to reset file->f_ra.ra_pages here as it is
> not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
> reduce ra_pages as a result of IO errors. Hence if you have had io
> errors, telling the kernel that you are now going to do  sequential
> IO should reset the readahead to the maximum ra_pages value
> supported

 Good point!

  but wait  this patch removes file->f_ra.ra_pages in all other
 places too, so there will be no file->f_ra.ra_pages to be reset here...
>>>
>>>
>>> In his patch,
>>>
>>>
>>>   static void shrink_readahead_size_eio(struct file *filp,
>>>  struct file_ra_state *ra)
>>>   {
>>> -   ra->ra_pages /= 4;
>>> +   spin_lock(>f_lock);
>>> +   filp->f_mode |= FMODE_RANDOM;
>>> +   spin_unlock(>f_lock);
>>>
>>> As the example in comment above this function, the read maybe still
>>> sequential, and it will waste IO bandwith if modify to FMODE_RANDOM
>>> directly.
>>
>> I've considered about this. On the first try I modified file_ra_state.size
>> and
>> file_ra_state.async_size directly, like
>>
>> file_ra_state.async_size = 0;
>> file_ra_state.size /= 4;
>>
>> but as what I comment here, we can not
>> predict whether the bad sectors will trash the readahead window, maybe the
>> following sectors after current one are ok to go in normal readahead,
>> it's hard to know,
>> the current approach gives us a chance to slow down softly.
>
>
> Then when will check filp->f_mode |= FMODE_RANDOM; ? Does it will influence
> ra->ra_pages?
You can find the relevant information in function page_cache_sync_readahead.

Thanks,
Ying Zhu
>
>
>>
>> Thanks,
>>  Ying Zhu

 Thanks,
 Fengguang

>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] tools, perf: Fix up for x86 UAPI disintegration

2012-10-25 Thread Namhyung Kim
Hi David,

On Thu, 25 Oct 2012 08:57:20 +0100, David Howells wrote:
> Borislav Petkov  wrote:
>
>> David, where can get that x86 UAPI disintegration patch?
>
> The tip tree has it in branch x86/uapi or you can get it from:
>
>   git://git.infradead.org/users/dhowells/linux-headers.git
>
> branch disintegrate-x86 or tag disintegrate-x86-20121009.
>
> I've posted a couple of additional patches to deal with files that became
> empty, but they're only for dealing with people who construct their kernel
> sources with the patch program.

I applied this series on top of you disintegrate-x86 branch which has
following commit.

commit 8d2c63c2b664bae1fb0f386661ea5f635330e570
Author: David Howells 
Date:   Tue Oct 9 09:47:54 2012 +0100

UAPI: (Scripted) Disintegrate arch/x86/include/asm

Signed-off-by: David Howells 
Acked-by: Arnd Bergmann 
Acked-by: Thomas Gleixner 
Acked-by: Michael Kerrisk 
Acked-by: Paul E. McKenney 
Acked-by: Dave Jones 


But I got a conflict like this:

--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@@ -112,7 -102,7 +102,11 @@@ void get_term_dimensions(struct winsiz
  #include 
  #include 
  
++<<< HEAD
 +#include "../../include/linux/perf_event.h"
++===
+ #include 
++>>> perf: Make perf build for x86 with UAPI disintegration applied
  #include "util/types.h"
  #include 

This was because your patch 3 has "uapi" between "include" and "linux".
It seems I need more patches to apply your series since there's no
perf_event.h under ../../include/uapi/linux directory.

Anyways, resolving the conflict resulted in build error:

CC builtin-kvm.o
builtin-kvm.c:25:21: fatal error: asm/svm.h: No such file or directory
make: *** [builtin-kvm.o] Error 1

CC util/evsel.o
In file included from util/perf_regs.h:5:0,
 from util/evsel.c:23:
arch/x86/include/perf_regs.h:6:27: fatal error: asm/perf_regs.h: No such file 
or directory
make: *** [util/evsel.o] Error 1

CC util/rbtree.o
../../lib/rbtree.c:24:36: fatal error: linux/rbtree_augmented.h: No such file 
or directory
make: *** [util/rbtree.o] Error 1

CC util/header.o
util/header.c:2276:8: error: ‘PERF_ATTR_SIZE_VER3’ undeclared here (not in a 
function)
make: *** [util/header.o] Error 1


Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Enable A20 using KBC for some MSI laptops to fix S3 resume

2012-10-25 Thread Robert Hancock

On 10/24/2012 02:09 PM, Alan Cox wrote:

On Wed, 24 Oct 2012 12:36:04 -0700
"H. Peter Anvin"  wrote:


Minor concern: it should do the wait for ready before sending each command.


Can we get a command line to do this quirk too - it strikes me that if
the MSIs rely upon it then it may be something Windows always does so
will be useful to try on other problem machines as an experiment.


I agree, one has to keep in mind the age-old question "how does Windows 
work?" since it surely has no such quirk. I'd say we're sometimes too 
quick to add these DMI quirks when a more general solution would be 
somehow figure out how the Linux behavior differs from what Windows is 
doing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpuidle: add missing header include

2012-10-25 Thread Jingoo Han
This patch adds missing device.h header to fix build warnings as below:

drivers/cpuidle/cpuidle.h:26:41: warning: 'struct device' declared inside 
parameter list [enabled by default]
drivers/cpuidle/cpuidle.h:26:41: warning: its scope is only this definition or 
declaration, which is probably not what you want
[enabled by default]
drivers/cpuidle/cpuidle.h:27:45: warning: 'struct device' declared inside 
parameter list [enabled by default]
In file included from drivers/cpuidle/driver.c:15:0:
drivers/cpuidle/cpuidle.h:26:41: warning: 'struct device' declared inside 
parameter list [enabled by default]
drivers/cpuidle/cpuidle.h:26:41: warning: its scope is only this definition or 
declaration, which is probably not what you want
[enabled by default]
drivers/cpuidle/cpuidle.h:27:45: warning: 'struct device' declared inside 
parameter list [enabled by default]

This build warning is introduced by commit efeca1b
"cpuidle / sysfs: change function parameter".

Signed-off-by: Jingoo Han 
Cc: Daniel Lezcano 
---
 drivers/cpuidle/cpuidle.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index a5bbd1c..2120d9e 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -5,6 +5,8 @@
 #ifndef __DRIVER_CPUIDLE_H
 #define __DRIVER_CPUIDLE_H
 
+#include 
+
 /* For internal use only */
 extern struct cpuidle_governor *cpuidle_curr_governor;
 extern struct list_head cpuidle_governors;
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 027/193] arch/sh: remove CONFIG_EXPERIMENTAL

2012-10-25 Thread Paul Mundt
On Tue, Oct 23, 2012 at 01:01:40PM -0700, Kees Cook wrote:
> This config item has not carried much meaning for a while now and is
> almost always enabled by default. As agreed during the Linux kernel
> summit, remove it.
> 
> CC: Paul Mundt 
> CC: Tejun Heo 
> Signed-off-by: Kees Cook 

While there are cases where it is largely superfluous, we also have
plenty of cases in here that are genuinely experimental features and
generally shouldn't be enabled unless someone is prepared for some
hacking. We can of course replace this with an arch-specific option if
needed, but I disagree with suddenly making experimental features
suddenly appear to be anything other than what they are.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL][PATCH] ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA

2012-10-25 Thread Steven Rostedt

Linus,

Please pull the latest ktest-v3.7-rc2 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest.git
ktest-v3.7-rc2

Head SHA1: 0979976ee53697f4308578c56abedb9766fea231


Steven Rostedt (1):
  ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA


 tools/testing/ktest/ktest.pl |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)
---
commit 8bc5e4ea3ea0e24142db2dc941233eab2a223ed4
Author: Steven Rostedt 
Date:   Fri Oct 26 00:10:32 2012 -0400

ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA

In order to decide if ktest should bother installing modules on the
target box, it checks if the config file has CONFIG_MODULES=y. But it
also checks if the '=y' part exists. It only will install modules if the
config exists and is set with '=y'. But as the regex that was used
tests:

  /^CONFIG_MODULES(=y)?/

this will also match:

  CONFIG_MODULES_USE_ELF_RELA

as the '=y' part was optional and it did not test the rest of the line.
When this happens, ktest will stop checking the rest of the configs but
it will also think that no modules are needed to be installed. What it
should do is only jump out of the loop if it actually found a
CONFIG_MODULES that is set to true.

Otherwise, ktest wont install the necessary modules needed for proper
booting of the test target.

Signed-off-by: Steven Rostedt 

diff --git a/tools/testing/ktest/ktest.pl b/tools/testing/ktest/ktest.pl
index b51d787..c7ba761 100755
--- a/tools/testing/ktest/ktest.pl
+++ b/tools/testing/ktest/ktest.pl
@@ -1740,8 +1740,10 @@ sub install {
 open(IN, "$output_config") or dodie("Can't read config file");
 while () {
if (/CONFIG_MODULES(=y)?/) {
-   $install_mods = 1 if (defined($1));
-   last;
+   if (defined($1)) {
+   $install_mods = 1;
+   last;
+   }
}
 }
 close(IN);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-25 Thread Linus Torvalds
On Thu, Oct 25, 2012 at 8:57 PM, Rik van Riel  wrote:
>
> That may not even be needed.  Apparently Intel chips
> automatically flush an entry from the TLB when it
> causes a page fault.  I assume AMD chips do the same,
> because flush_tlb_fix_spurious_fault evaluates to
> nothing on x86.

Yes. It's not architected as far as I know, though. But I agree, it's
possible - even likely - we could avoid TLB flushing entirely on x86.

If you want to try it, I would seriously suggest you do it as a
separate commit though, just in case.

> Are there architectures where we do need to flush
> remote TLBs on upgrading the permissions on a PTE?

I *suspect* that whole TLB flush just magically became an SMP one
without anybody ever really thinking about it.

So it's quite possible we could do this to the pgtable-generic.c code
too. However, we don't actually have any generic way to do a local
single-address flush (the __flush_tlb_one() thing is
architecture-specific, although it exists on a few architectures).
We'd need to add a local_flush_tlb_page(vma, address) function.

Alternatively, we could decide to use the "tlb_fix_spurious_fault()"
thing in there. Possibly just do it unconditionally in the caller - or
even just specify that the fault handler has to do it. And stop
returning a value at all from ptep_set_access_flags() (I *think*
that's the only thing the return value gets used for - flushing the
TLB on the local cpu for the cpu's that want it).

> Want to just remove the TLB flush entirely and see
> if anything breaks in 3.8-rc1?
>
> From reading the code again, it looks like things
> should indeed work ok.

I would be open to it, but just in case it causes bisectable problems
I'd really want to see it in two patches ("make it always do the local
flush" followed by "remove even the local flush"), and then it would
pinpoint any need.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH Resend V2] dt: add helper function to read u8 & u16 variables & arrays

2012-10-25 Thread Viresh Kumar
This adds following helper routines:
- of_property_read_u8_array()
- of_property_read_u16_array()
- of_property_read_u8()
- of_property_read_u16()

First two actually share most of the code with of_property_read_u32_array(), so
the common part is taken out into a macro, which can be used by all three
*_array() routines.

Signed-off-by: Viresh Kumar 
---
V1->V2:
-
- Use typeof() in of_property_read_array() macro instead of passing type to it

 drivers/of/base.c  | 73 +++---
 include/linux/of.h | 30 ++
 2 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index af3b22a..039e178 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -670,6 +670,64 @@ struct device_node *of_find_node_by_phandle(phandle handle)
 }
 EXPORT_SYMBOL(of_find_node_by_phandle);
 
+#define of_property_read_array(_np, _pname, _out, _sz) \
+   struct property *_prop = of_find_property(_np, _pname, NULL);   \
+   const __be32 *_val; \
+   \
+   if (!_prop) \
+   return -EINVAL; \
+   if (!_prop->value)  \
+   return -ENODATA;\
+   if ((_sz * sizeof(*_out)) > _prop->length)  \
+   return -EOVERFLOW;  \
+   \
+   _val = _prop->value;\
+   while (_sz--)   \
+   *_out++ = (typeof(*_out))be32_to_cpup(_val++);  \
+   return 0;
+
+/**
+ * of_property_read_u8_array - Find and read an array of u8 from a property.
+ *
+ * @np:device node from which the property value is to be read.
+ * @propname:  name of the property to be searched.
+ * @out_value: pointer to return value, modified only if return value is 0.
+ *
+ * Search for a property in a device node and read 8-bit value(s) from
+ * it. Returns 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_value is modified only if a valid u8 value can be decoded.
+ */
+int of_property_read_u8_array(const struct device_node *np,
+   const char *propname, u8 *out_values, size_t sz)
+{
+   of_property_read_array(np, propname, out_values, sz);
+}
+EXPORT_SYMBOL_GPL(of_property_read_u8_array);
+
+/**
+ * of_property_read_u16_array - Find and read an array of u16 from a property.
+ *
+ * @np:device node from which the property value is to be read.
+ * @propname:  name of the property to be searched.
+ * @out_value: pointer to return value, modified only if return value is 0.
+ *
+ * Search for a property in a device node and read 16-bit value(s) from
+ * it. Returns 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ *
+ * The out_value is modified only if a valid u16 value can be decoded.
+ */
+int of_property_read_u16_array(const struct device_node *np,
+   const char *propname, u16 *out_values, size_t sz)
+{
+   of_property_read_array(np, propname, out_values, sz);
+}
+EXPORT_SYMBOL_GPL(of_property_read_u16_array);
+
 /**
  * of_property_read_u32_array - Find and read an array of 32 bit integers
  * from a property.
@@ -689,20 +747,7 @@ int of_property_read_u32_array(const struct device_node 
*np,
   const char *propname, u32 *out_values,
   size_t sz)
 {
-   struct property *prop = of_find_property(np, propname, NULL);
-   const __be32 *val;
-
-   if (!prop)
-   return -EINVAL;
-   if (!prop->value)
-   return -ENODATA;
-   if ((sz * sizeof(*out_values)) > prop->length)
-   return -EOVERFLOW;
-
-   val = prop->value;
-   while (sz--)
-   *out_values++ = be32_to_cpup(val++);
-   return 0;
+   of_property_read_array(np, propname, out_values, sz);
 }
 EXPORT_SYMBOL_GPL(of_property_read_u32_array);
 
diff --git a/include/linux/of.h b/include/linux/of.h
index 72843b7..e2d9b40 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -223,6 +223,10 @@ extern struct device_node *of_find_node_with_property(
 extern struct property *of_find_property(const struct device_node *np,
 const char *name,
 int *lenp);
+extern int of_property_read_u8_array(const struct 

[PATCH] net: usb: Fix memory leak on Tx data path

2012-10-25 Thread Hemant Kumar
Driver anchors the tx urbs and defers the urb submission if
a transmit request comes when the interface is suspended.
Anchoring urb increments the urb reference count. These
deferred urbs are later accessed by calling usb_get_from_anchor()
for submission during interface resume. usb_get_from_anchor()
unanchors the urb but urb reference count remains same.
This causes the urb reference count to remain non-zero
after usb_free_urb() gets called and urb never gets freed.
Hence call usb_put_urb() after anchoring the urb to properly
balance the reference count for these deferred urbs. Also,
unanchor these deferred urbs during disconnect, to free them
up.

Signed-off-by: Hemant Kumar 
---
 drivers/net/usb/usbnet.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 1867fe2..00b7598 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1168,6 +1168,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
usb_anchor_urb(urb, >deferred);
/* no use to process more packets */
netif_stop_queue(net);
+   usb_put_urb(urb);
spin_unlock_irqrestore(>txq.lock, flags);
netdev_dbg(dev->net, "Delaying transmission for resumption\n");
goto deferred;
@@ -1317,6 +1318,8 @@ void usbnet_disconnect (struct usb_interface *intf)
 
cancel_work_sync(>kevent);
 
+   usb_scuttle_anchored_urbs(>deferred);
+
if (dev->driver_info->unbind)
dev->driver_info->unbind (dev, intf);
 
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted 
by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dt: add helper function to read u8 & u16 variables & arrays

2012-10-25 Thread Viresh Kumar
On 25 October 2012 19:48, Viresh Kumar  wrote:
> The problem i see here is:
>
> The data passed via DT comes as Little Endian in the kernel.
>
> For a little endian system, byte zero will contain the data and so
> (u8) val
>
> look to be the correct thing.
>
> For a big endian system, byte 3 will contain data as it is swapped by
> be32_to_cpup.
> So (u8) val would return value stored by byte 0 instead. ??

I feel above explanation was wrong. I didn't had a big endian system
to test this but this is what i derived theoretically. Consider following
sequence of commands:

u32 x = 0x01;  //This will store off-0: 1, off-3:0 for LE system
   // and will store off-0: 0, off-3:1
for BE system
u8 y = (u8) x;  // For any architecture type, i.e. big or
little this must store
   // 1 in y. This is ANCI C semantic
and should be architecture
   // independent.

Which would mean, type cast will give off-0 on LE and off-3 on BE systems. Don't
confuse this with getting values using pointers, as we try to get data
out of specific
locations in those cases.

So, my initial code seems to be doing the right thing. be32_to_cpup()
will move the
LSB to off-3 and that is what we will get during the cast.

I will resend my original code to you.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP

2012-10-25 Thread Eric W. Biederman
HATAYAMA Daisuke  writes:

> From: "H. Peter Anvin" 
> Subject: Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP
> Date: Mon, 22 Oct 2012 17:35:47 -0700
>
>> On 10/22/2012 02:29 PM, Eric W. Biederman wrote:

 As I said, I thought Fenghua tried that but it didn't work,
 experimentally.
>>>
>>> Fair enough.  You described the problem with clearing bit 8 in a weird
>>> way.
>>>
>>> If the best we can muster are fuzzy memories it may be worth
>>> revisiting.
>>> Perhaps it works on enough cpu models to be interesting.
>>>
>> 
>> It isn't fuzzy memories... this was done as late as 1-2 months ago.  I
>> just don't know the details.
>> 
>> Fenghua, could you help fill us in?
>> 
>
> I overlooked completely the fact that BSP flag is rewritable.
>
> I tried Eric's suggestion using attached test programs and saw it
> worked fine at least on the three cpus around me below:
>
> - Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
> - Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
> - Intel(R) Xeon(TM) CPU 1.80GHz
>   - 32 bits CPU
>
> Next I found the description about this in 8.4.2, IASDM Vol.3:
>
>   The MP initialization protocol imposes the following requirements
>   and restrictions on the system:
>
>   * The MP protocol is executed only after a power-up or RESET. If the
> MP protocol has completed and a BSP is chosen, subsequent INITs
> (either to a specific processor or system wide) do not cause the
> MP protocol to be repeated. Instead, each logical processor
> examines its BSP flag (in the IA32_APIC_BASE MSR) to determine
> whether it should execute the BIOS boot-strap code (if it is the
> BSP) or enter a wait-for-SIPI state (if it is an AP).
>
> So this is no longer undocumented behaviour for recent cpus, I think.

The underdocumented bit is the ability to clear the flag.
And of course these are processor specific registers.

> Considering these, I'll make a patch to clear BSP flag at appropreate
> position in kernel boot-up code. OTOH, according to the discussion, it
> was reported that clearing BSP flag affected some BIOSes. To deal with
> this, I'll prepare a kernel option to decide whether to clear BSP flag
> or not.
>
> Does anyone have any comments now? Or please comment after I submit a
> new patch.

I think you are on right track with preparing some patches, and this
certainly looks like worth experimenting with.

At least for i386 the code need to verify you have a cpu new enough to
have an APIC_BASE_MSR, but I don't think that is going to be hard.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/booting.txt

2012-10-25 Thread Rob Landley
On 10/24/2012 11:11:36 AM, Catalin Marinas wrote:
> 2012/10/24 Tekkaman Ninja :
> > This is a Chinese translated version of
> > Documentation/arm64/booting.txt
> >
> > Signed-off-by: Fu Wei 
> 
> Same as my comment on the memory.txt patch, I don't understand what 
> it
> says, so you need to keep it up to date.
> 
> Acked-by: Catalin Marinas 

I have a similar problem: I can't read translations to languages I 
don't speak. I argued for putting them on the web way back when, but 
Greg Kroah-Hartman incorporated stuff he can't read either into the 
Documentation directory.

Last I pinged him he was willing to maintain non-english translations, 
try sending them to him?

Rob


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread Fengguang Wu
On Fri, Oct 26, 2012 at 11:38:11AM +0800, YingHang Zhu wrote:
> On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner  wrote:
> > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:
> >> Hi Chen,
> >>
> >> > But how can bdi related ra_pages reflect different files' readahead
> >> > window? Maybe these different files are sequential read, random read
> >> > and so on.
> >>
> >> It's simple: sequential reads will get ra_pages readahead size while
> >> random reads will not get readahead at all.
> >>
> >> Talking about the below chunk, it might hurt someone that explicitly
> >> takes advantage of the behavior, however the ra_pages*2 seems more
> >> like a hack than general solution to me: if the user will need
> >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for
> >> improving IO performance, then why not just increase bdi->ra_pages and
> >> benefit all reads? One may argue that it offers some differential
> >> behavior to specific applications, however it may also present as a
> >> counter-optimization: if the root already tuned bdi->ra_pages to the
> >> optimal size, the doubled readahead size will only cost more memory
> >> and perhaps IO latency.
> >>
> >> --- a/mm/fadvise.c
> >> +++ b/mm/fadvise.c
> >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, 
> >> loff_t len, int advice)
> >> spin_unlock(>f_lock);
> >> break;
> >> case POSIX_FADV_SEQUENTIAL:
> >> -   file->f_ra.ra_pages = bdi->ra_pages * 2;
> >
> > I think we really have to reset file->f_ra.ra_pages here as it is
> > not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
> > reduce ra_pages as a result of IO errors. Hence if you have had io
> > errors, telling the kernel that you are now going to do  sequential
> > IO should reset the readahead to the maximum ra_pages value
> > supported
> If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone
> device's readahead can be directly tuned or turned off with blockdev
> thus affect all files
> using the device and without bring more complexity...

It's not really feasible/convenient for the end users to hand tune
blockdev readahead size on IO errors. Even many administrators are
totally unaware of the readahead size parameter.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-25 Thread Rik van Riel

On 10/25/2012 10:56 PM, Linus Torvalds wrote:


Guess what? If you want to optimize the function to not do remote TLB
flushes, then just do that! None of the garbage. Just change the

 flush_tlb_page(vma, address);

line to

 __flush_tlb_one(address);


That may not even be needed.  Apparently Intel chips
automatically flush an entry from the TLB when it
causes a page fault.  I assume AMD chips do the same,
because flush_tlb_fix_spurious_fault evaluates to
nothing on x86.


and it should damn well work. Because everything I see about
"flush_remote" looks just wrong, wrong, wrong.


Are there architectures where we do need to flush
remote TLBs on upgrading the permissions on a PTE?

Because that is what the implementation in
pgtable-generic.c seems to be doing as well...


And if there really is some reason for that whole flush_remote
braindamage, then we have much bigger problems, namely the fact that
we've broken the documented semantics of that function, and we're
doing various other things that are completely and utterly invalid
unless the above semantics hold.


Want to just remove the TLB flush entirely and see
if anything breaks in 3.8-rc1?

From reading the code again, it looks like things
should indeed work ok.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Al Viro
On Thu, Oct 25, 2012 at 08:38:25PM -0700, Linus Torvalds wrote:

> It's valid to cast a non-const pointer to a const one. It's the
> *other* way around that is invalid.
> 
> So marking fw_path[] as having 'const char *' elements just means that
> we won't be changing those elements through the fw_path[] array
> (correct: we only read them). The fact that one of those same pointers
> is then also available through a non-const pointer variable means that
> they can change through *that* pointer, but that doesn't change the
> fact that fw_path[] itself contains const pointers.
> 
> Remember: in C, a "const pointer" does *not* mean that the thing it
> points to cannot change. It only means that it cannot change through
> *that* pointer.

It's a bit trickier, unfortunately - pointer to pointer to const char
and pointer to pointer to char do not mix.  Just for fun, try to constify
envp and argv arguments of call_usermodehelper()...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 11:28 AM, YingHang Zhu wrote:

On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen  wrote:

On 10/26/2012 09:27 AM, Fengguang Wu wrote:

On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote:

On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:

Hi Chen,


But how can bdi related ra_pages reflect different files' readahead
window? Maybe these different files are sequential read, random read
and so on.

It's simple: sequential reads will get ra_pages readahead size while
random reads will not get readahead at all.

Talking about the below chunk, it might hurt someone that explicitly
takes advantage of the behavior, however the ra_pages*2 seems more
like a hack than general solution to me: if the user will need
POSIX_FADV_SEQUENTIAL to double the max readahead window size for
improving IO performance, then why not just increase bdi->ra_pages and
benefit all reads? One may argue that it offers some differential
behavior to specific applications, however it may also present as a
counter-optimization: if the root already tuned bdi->ra_pages to the
optimal size, the doubled readahead size will only cost more memory
and perhaps IO latency.

--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset,
loff_t len, int advice)
  spin_unlock(>f_lock);
  break;
  case POSIX_FADV_SEQUENTIAL:
-   file->f_ra.ra_pages = bdi->ra_pages * 2;

I think we really have to reset file->f_ra.ra_pages here as it is
not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
reduce ra_pages as a result of IO errors. Hence if you have had io
errors, telling the kernel that you are now going to do  sequential
IO should reset the readahead to the maximum ra_pages value
supported

Good point!

 but wait  this patch removes file->f_ra.ra_pages in all other
places too, so there will be no file->f_ra.ra_pages to be reset here...


In his patch,


  static void shrink_readahead_size_eio(struct file *filp,
 struct file_ra_state *ra)
  {
-   ra->ra_pages /= 4;
+   spin_lock(>f_lock);
+   filp->f_mode |= FMODE_RANDOM;
+   spin_unlock(>f_lock);

As the example in comment above this function, the read maybe still
sequential, and it will waste IO bandwith if modify to FMODE_RANDOM
directly.

I've considered about this. On the first try I modified file_ra_state.size and
file_ra_state.async_size directly, like

file_ra_state.async_size = 0;
file_ra_state.size /= 4;

but as what I comment here, we can not
predict whether the bad sectors will trash the readahead window, maybe the
following sectors after current one are ok to go in normal readahead,
it's hard to know,
the current approach gives us a chance to slow down softly.


Then when will check filp->f_mode |= FMODE_RANDOM; ? Does it will 
influence ra->ra_pages?




Thanks,
 Ying Zhu

Thanks,
Fengguang



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Linus Torvalds
On Thu, Oct 25, 2012 at 8:12 PM, Ming Lei  wrote:
>
> Yes, it should be the cleanest, I don't do it because I thought that might
> have caused one compile warning('const char *' points to memory
> without 'const', like below)

You can just keep the const.

In fact, you could even add one, and make it be

   static const char * const fw_path[] = {

We currently don't mark fw_path[] itself const (even though it is),
only the strings it points to.

> but in fact there isn't any warning with above change and it does work, still
> don't know why? :-(

It's valid to cast a non-const pointer to a const one. It's the
*other* way around that is invalid.

So marking fw_path[] as having 'const char *' elements just means that
we won't be changing those elements through the fw_path[] array
(correct: we only read them). The fact that one of those same pointers
is then also available through a non-const pointer variable means that
they can change through *that* pointer, but that doesn't change the
fact that fw_path[] itself contains const pointers.

Remember: in C, a "const pointer" does *not* mean that the thing it
points to cannot change. It only means that it cannot change through
*that* pointer.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread YingHang Zhu
On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner  wrote:
> On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:
>> Hi Chen,
>>
>> > But how can bdi related ra_pages reflect different files' readahead
>> > window? Maybe these different files are sequential read, random read
>> > and so on.
>>
>> It's simple: sequential reads will get ra_pages readahead size while
>> random reads will not get readahead at all.
>>
>> Talking about the below chunk, it might hurt someone that explicitly
>> takes advantage of the behavior, however the ra_pages*2 seems more
>> like a hack than general solution to me: if the user will need
>> POSIX_FADV_SEQUENTIAL to double the max readahead window size for
>> improving IO performance, then why not just increase bdi->ra_pages and
>> benefit all reads? One may argue that it offers some differential
>> behavior to specific applications, however it may also present as a
>> counter-optimization: if the root already tuned bdi->ra_pages to the
>> optimal size, the doubled readahead size will only cost more memory
>> and perhaps IO latency.
>>
>> --- a/mm/fadvise.c
>> +++ b/mm/fadvise.c
>> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t 
>> len, int advice)
>> spin_unlock(>f_lock);
>> break;
>> case POSIX_FADV_SEQUENTIAL:
>> -   file->f_ra.ra_pages = bdi->ra_pages * 2;
>
> I think we really have to reset file->f_ra.ra_pages here as it is
> not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
> reduce ra_pages as a result of IO errors. Hence if you have had io
> errors, telling the kernel that you are now going to do  sequential
> IO should reset the readahead to the maximum ra_pages value
> supported
If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone
device's readahead can be directly tuned or turned off with blockdev
thus affect all files
using the device and without bring more complexity...

Thanks,
 Ying Zhu
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 02/16 v2] f2fs: add on-disk layout

2012-10-25 Thread Jaegeuk Kim
[snip]
> > +#define F2FS_SUPER_MAGIC   0xF2F52010
> > +#define F2FS_SUPER_OFFSET  0   /* start sector # for sb */
> 
> Does f2fs superblock really haven't any offset from the volume begin?

The reason that I changed this from 1 to 0 is due to the failure during android
recovery. I don't know why the recovery is failed when the offset is 1, but it
works fine after the offset is changed to 0.
I suspect that mount procedure inspects the 0'th offset to figure out what file
system is installed by checking some kind of magic numbers.
Sometimes, we've seen that the mount program tries to load previously installed
file system even though mkfs.f2fs was conducted.
Would you recommend something?

> 
> > +#define F2FS_BLKSIZE   4096
> > +#define F2FS_MAX_EXTENSION 64
> > +
> > +#define NULL_ADDR  0x0U
> > +#define NEW_ADDR   -1U
> 
> Does NULL_ADDR and NEW_ADDR declarations really need? Does kernel
> haven't any analogous?

These are used for F2FS-specific block allocation, so for readability,
I don't want to change this.

> 
> > +
> > +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
> > +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
> > +#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
> > +
> > +#define GFP_F2FS_MOVABLE   (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
> > +
> > +#define MAX_ACTIVE_LOGS16
> > +#define MAX_ACTIVE_NODE_LOGS   8
> > +#define MAX_ACTIVE_DATA_LOGS   8
> 
> I think that it makes sense to comment the reasons of such limitations
> in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS.

The maximum number of logs is suggested by arnd before.
As I understood, why he suggested such a quite large number is for further
optimization of multiple logs without any on-disk layout changes.
And, I think it is quite enough.

> 
> > +
> > +/*
> > + * For superblock
> > + */
> > +struct f2fs_super_block {
> > +   __le32 magic;   /* Magic Number */
> > +   __le16 major_ver;   /* Major Version */
> > +   __le16 minor_ver;   /* Minor Version */
> > +   __le32 log_sectorsize;  /* log2 (Sector size in bytes) */
> > +   __le32 log_sectors_per_block;   /* log2 (Number of sectors per block */
> > +   __le32 log_blocksize;   /* log2 (Block size in bytes) */
> > +   __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */
> 
> From my point of view, __le32 is big data type for log2 (). What
> do you think?
> 

Right, but it is superblock. Should we have to consider space overhead?

> > +   __le32 segs_per_sec;/* Number of segments per section */
> > +   __le32 secs_per_zone;   /* Number of sections per zone */
> > +   __le32 checksum_offset; /* Checksum position in this super block */
> > +   __le64 block_count; /* Total number of blocks */
> > +   __le32 section_count;   /* Total number of sections */
> > +   __le32 segment_count;   /* Total number of segments */
> > +   __le32 segment_count_ckpt; /* Total number of segments
> > + in Checkpoint area */
> > +   __le32 segment_count_sit; /* Total number of segments
> > +in Segment information table */
> > +   __le32 segment_count_nat; /* Total number of segments
> > +in Node address table */
> > +   /*Total number of segments in Segment summary area */
> > +   __le32 segment_count_ssa;
> > +   /* Total number of segments in Main area */
> > +   __le32 segment_count_main;
> > +   __le32 failure_safe_block_distance;
> > +   __le32 segment0_blkaddr;/* Start block address of Segment 0 */
> > +   __le32 start_segment_checkpoint; /* Start block address of ckpt */
> > +   __le32 sit_blkaddr; /* Start block address of SIT */
> > +   __le32 nat_blkaddr; /* Start block address of NAT */
> > +   __le32 ssa_blkaddr; /* Start block address of SSA */
> > +   __le32 main_blkaddr;/* Start block address of Main area */
> > +   __le32 root_ino;/* Root directory inode number */
> > +   __le32 node_ino;/* node inode number */
> > +   __le32 meta_ino;/* meta inode number */
> > +   __le32 volume_serial_number;/* VSN is optional field */
> 
> Usually, it is used 128-bits UUID for serial number. Why do you use
> __le32 as volume_serial_number?

Ok, I'll change.

[snip]
> > +/*
> > + * For directory operations
> > + */
> > +#define F2FS_DOT_HASH  0
> > +#define F2FS_DDOT_HASH F2FS_DOT_HASH
> > +#define F2FS_MAX_HASH  (~((0x3ULL) << 62))
> > +#define F2FS_HASH_COL_BIT  ((0x1ULL) << 63)
> > +
> > +typedef __le32 f2fs_hash_t;
> > +
> > +#define F2FS_NAME_LEN  8
> 
> It exists F2FS_MAX_NAME_LEN. I think that it makes sense to comment here
> purpose of F2FS_NAME_LEN declaration.

Ok, thanks.

---
Jaegeuk Kim
Samsung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kdump with signed images

2012-10-25 Thread Eric W. Biederman
Matthew Garrett  writes:

> On Thu, Oct 25, 2012 at 09:15:58PM -0400, Mimi Zohar wrote:
>
>> On a running system, the package installer, after verifying the package
>> integrity, would install each file with the associated 'security.ima'
>> extended attribute.  The 'security.evm' digital signature would be
>> installed with an HMAC, calculated using a system unique key. 
>
> The idea isn't to prevent /sbin/kexec from being modified after 
> installation - it's to prevent it from being possible to install a 
> system that has a modified /sbin/kexec. Leaving any part of this up to 
> the package installer means that it doesn't solve the problem we're 
> trying to solve here. It must be impossible for the kernel to launch any 
> /sbin/kexec that hasn't been signed by a trusted key that's been built 
> into the kernel, and it must be impossible for anything other than 
> /sbin/kexec to make the kexec system call.

The 'security.capability' attribute modulo weirdness with the security
bounding set gives us the necessary tools to allow /sbin/kexec to make
the system call.

The primary trick with this is to limit the installer in such as way
that we can trust the installer even on a system on which root has been
compromised.

Trusting the installer is the same class of problem as trusting
/sbin/kexec, and to me a much more interesting problem as it keeps
critical system files from being tampered with.

It sounds like there are some tricky details to work through but this
direction of system integrity looks like it is worth pursuing,
regardless of how we handle a signed /sbin/kexec.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] cpufreq: governors: remove redundant code

2012-10-25 Thread Viresh Kumar
On 26 October 2012 05:43, Rafael J. Wysocki  wrote:

> I have applied this patch only because of the fixes on top of it.  It broke
> kernel compliation due to some missing EXPORT_SYMBOL_GPLs in 
> cpufreq_governor.c,
> so I woulnd't have applied it otherwise.

Hi Rafael,

So sorry for this. I am really feeling bad for that. I should have
tried compiling
them as modules too. I had that in mind while coding it, but forgot it later.

Thanks for making changes on my behalf.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread YingHang Zhu
On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen  wrote:
> On 10/26/2012 09:27 AM, Fengguang Wu wrote:
>>
>> On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote:
>>>
>>> On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:

 Hi Chen,

> But how can bdi related ra_pages reflect different files' readahead
> window? Maybe these different files are sequential read, random read
> and so on.

 It's simple: sequential reads will get ra_pages readahead size while
 random reads will not get readahead at all.

 Talking about the below chunk, it might hurt someone that explicitly
 takes advantage of the behavior, however the ra_pages*2 seems more
 like a hack than general solution to me: if the user will need
 POSIX_FADV_SEQUENTIAL to double the max readahead window size for
 improving IO performance, then why not just increase bdi->ra_pages and
 benefit all reads? One may argue that it offers some differential
 behavior to specific applications, however it may also present as a
 counter-optimization: if the root already tuned bdi->ra_pages to the
 optimal size, the doubled readahead size will only cost more memory
 and perhaps IO latency.

 --- a/mm/fadvise.c
 +++ b/mm/fadvise.c
 @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset,
 loff_t len, int advice)
  spin_unlock(>f_lock);
  break;
  case POSIX_FADV_SEQUENTIAL:
 -   file->f_ra.ra_pages = bdi->ra_pages * 2;
>>>
>>> I think we really have to reset file->f_ra.ra_pages here as it is
>>> not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
>>> reduce ra_pages as a result of IO errors. Hence if you have had io
>>> errors, telling the kernel that you are now going to do  sequential
>>> IO should reset the readahead to the maximum ra_pages value
>>> supported
>>
>> Good point!
>>
>>  but wait  this patch removes file->f_ra.ra_pages in all other
>> places too, so there will be no file->f_ra.ra_pages to be reset here...
>
>
> In his patch,
>
>
>  static void shrink_readahead_size_eio(struct file *filp,
> struct file_ra_state *ra)
>  {
> -   ra->ra_pages /= 4;
> +   spin_lock(>f_lock);
> +   filp->f_mode |= FMODE_RANDOM;
> +   spin_unlock(>f_lock);
>
> As the example in comment above this function, the read maybe still
> sequential, and it will waste IO bandwith if modify to FMODE_RANDOM
> directly.
I've considered about this. On the first try I modified file_ra_state.size and
file_ra_state.async_size directly, like

file_ra_state.async_size = 0;
file_ra_state.size /= 4;

but as what I comment here, we can not
predict whether the bad sectors will trash the readahead window, maybe the
following sectors after current one are ok to go in normal readahead,
it's hard to know,
the current approach gives us a chance to slow down softly.

Thanks,
Ying Zhu
>
>>
>> Thanks,
>> Fengguang
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm: l2cc: doc: fix device tree example typo

2012-10-25 Thread Rob Herring
On 10/23/2012 07:53 PM, Josh Cartwright wrote:
> The list of attributes above details the use of the 'filter-ranges'
> property, but the example improperly used 'filter-latency'.  Make these
> consistent by fixing up the example.
> 
> Signed-off-by: Josh Cartwright 

Applied for 3.8 (unless I get more to send for 3.7).

Rob

> ---
>  Documentation/devicetree/bindings/arm/l2cc.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt 
> b/Documentation/devicetree/bindings/arm/l2cc.txt
> index 7ca5216..7c3ee3a 100644
> --- a/Documentation/devicetree/bindings/arm/l2cc.txt
> +++ b/Documentation/devicetree/bindings/arm/l2cc.txt
> @@ -37,7 +37,7 @@ L2: cache-controller {
>  reg = <0xfff12000 0x1000>;
>  arm,data-latency = <1 1 1>;
>  arm,tag-latency = <2 2 2>;
> -arm,filter-latency = <0x8000 0x800>;
> +arm,filter-ranges = <0x8000 0x800>;
>  cache-unified;
>  cache-level = <2>;
>   interrupts = <45>;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the arm-soc tree with the pinctrl tree

2012-10-25 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/mach-ux500/cpu-db8500.c between commit 63c53906312f
("pinctrl/nomadik: move the platform data header") from the pinctrl tree
and commit 4040d10a3d44 ("ARM: ux500: add DB serial number to entropy
pool") from the arm-soc tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-ux500/cpu-db8500.c
index 87a8f9f,50202a1..000
--- a/arch/arm/mach-ux500/cpu-db8500.c
+++ b/arch/arm/mach-ux500/cpu-db8500.c
@@@ -18,7 -18,7 +18,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  
  #include 
  #include 


pgpFxzCCbQqm5.pgp
Description: PGP signature


Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP

2012-10-25 Thread HATAYAMA Daisuke
From: "H. Peter Anvin" 
Subject: Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP
Date: Mon, 22 Oct 2012 17:35:47 -0700

> On 10/22/2012 02:29 PM, Eric W. Biederman wrote:
>>>
>>> As I said, I thought Fenghua tried that but it didn't work,
>>> experimentally.
>>
>> Fair enough.  You described the problem with clearing bit 8 in a weird
>> way.
>>
>> If the best we can muster are fuzzy memories it may be worth
>> revisiting.
>> Perhaps it works on enough cpu models to be interesting.
>>
> 
> It isn't fuzzy memories... this was done as late as 1-2 months ago.  I
> just don't know the details.
> 
> Fenghua, could you help fill us in?
> 

I overlooked completely the fact that BSP flag is rewritable.

I tried Eric's suggestion using attached test programs and saw it
worked fine at least on the three cpus around me below:

- Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
- Intel(R) Xeon(R) CPU E7- 8870  @ 2.40GHz
- Intel(R) Xeon(TM) CPU 1.80GHz
  - 32 bits CPU

Next I found the description about this in 8.4.2, IASDM Vol.3:

  The MP initialization protocol imposes the following requirements
  and restrictions on the system:

  * The MP protocol is executed only after a power-up or RESET. If the
MP protocol has completed and a BSP is chosen, subsequent INITs
(either to a specific processor or system wide) do not cause the
MP protocol to be repeated. Instead, each logical processor
examines its BSP flag (in the IA32_APIC_BASE MSR) to determine
whether it should execute the BIOS boot-strap code (if it is the
BSP) or enter a wait-for-SIPI state (if it is an AP).

So this is no longer undocumented behaviour for recent cpus, I think.

Considering these, I'll make a patch to clear BSP flag at appropreate
position in kernel boot-up code. OTOH, according to the discussion, it
was reported that clearing BSP flag affected some BIOSes. To deal with
this, I'll prepare a kernel option to decide whether to clear BSP flag
or not.

Does anyone have any comments now? Or please comment after I submit a
new patch.

Thanks.
HATAYAMA, Daisuke


bsp_flag_modules.tar.bz2
Description: Binary data


linux-next: manual merge of the arm-soc tree with the gpio-lw tree

2012-10-25 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/Kconfig between commit a3b8d4a51357 ("GPIO: Add support for GPIO
on CLPS711X-target platform") from the gpio-lw tree and commit
4a8355c4c34f ("ARM: clps711x: convert to clockevents") from the arm-soc
tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/Kconfig
index a7c541e,d9b7a84..000
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@@ -366,8 -364,6 +366,7 @@@ config ARCH_CNS3XX
  
  config ARCH_CLPS711X
bool "Cirrus Logic CLPS711x/EP721x/EP731x-based"
 +  select ARCH_REQUIRE_GPIOLIB
-   select ARCH_USES_GETTIMEOFFSET
select CLKDEV_LOOKUP
select COMMON_CLK
select CPU_ARM720T


pgpanTYNefaY1.pgp
Description: PGP signature


Re: The idea about scheduler test module(STM)

2012-10-25 Thread Michael Wang
On 10/26/2012 10:27 AM, Charles Wang wrote:
> Yes, it's a new way to do scheduler test. But why use kernel
>  threads? The info, total time, run time, wait time, preempt number, all
> can be collected from tasks' sched info from /proc/pid/sched and
> /proc/pid/stat.

That's right, next version's STM could get info like proc, actually
there are many ways to implement it, but we need to figure out whether
STM is needed firstly ;-)

> 
> I don't understand clearly about "pure scheduler performance" here. In
> order to test scheduler fully, we need to do IO test, and other
> subsystems will be involved in. But if other subsystems' code are not
> changed, the result still can refer to scheduler's change.

For example when we test the latency of net traffic, there are software,
nic driver, nic, router, and the same thing on peer machine, all those
staff besides scheduler are unstable even the code not changed, and can
easily influence the test result.

Let's say we changed some param like 'sched_latency_ns' which we think
may help to reduce the latency of system, and suppose it really helped
to reduce 0.xx% latency, how can we detect that while so many unstable
thing in our test?

Then after we have done many changes, suppose each of them can help
increase 0.xx% but we can't know, so we just think those param are
useless, but actually the accumulate improvement may be x% which we
really care.

So the pure means with out any influence from other thing.

But I think STM should also have the ability to do the normal test which
included all the subsystem we want, and then we could know which one is
the real bottleneck.

> 
> I can't think much farther about the advantage this way could give.
> Maybe you should show us ur better examples. :)

I'd like to make it more useful not just a demo, but I need more
feedback and suggestions :)

Regards,
Michael Wang

> 
> Regards,
> Charles
> 
> On 10/25/2012 01:40 PM, Michael Wang wrote:
>> Hi, Folks
>>
>> Charles has raised a problem that we don't have any tool yet
>> for testing the scheduler with out any disturb from other
>> subsystem, and I also found it's hard to test scheduler optimize
>> patch, since the improvement could be easily eaten by other
>> subsystem like IO.
>>
>> So Let's check the tools we have currently:
>> 1. perf sched
>>
>> we can use it to trace the threads we interested, and
>> the info it provided is very good, but one issue is,
>> it could not create the workload we want, also collect
>> the info and do summary is not so easy.
>>
>> 2. linsched
>>
>> It's a very good tool to create the test environment,
>> but it's implementation is to ideal, so it could not
>> present the real world problem.
>>
>> Since both perf and linsched could not meet our requirement, we
>> decided to develop a new tool, let's currently call it
>> scheduler test module(STM).
>>
>> It's propose is:
>> 1. create the workload we want.
>> 2. test the pure scheduler.
>> 3. collect info we need and do summary.
>>
>> This tool should be very easy to use and not depends on
>> the implementation of scheduler.
>>
>> We can use it to check the pure scheduler performance on
>> our system.
>>
>> We can use it to check whether there are regression in
>> scheduler when testing patches.
>>
>> And other usage I've not figure out yet.
>>
>> In order to explain the idea more directly, I have wrote a
>> prototype STM, it's a separate module, and you can use it
>> just like 'rcutorture'.
>>
>> I attached a small script 'play.sh' to help you easily
>> run the test, put 'schedtm.c' and 'play.sh' in same directory
>> and run 'play.sh', you will see out put like:
>>
>> schedtm: summary
>> schedtm: cpu count:cpurunpreempt
>> schedtm: 013811381
>> schedtm: 1957955
>> schedtm: 2900900
>> schedtm: 310351034
>> schedtm: 4991990
>> schedtm: 5940939
>> schedtm: 6900897
>> schedtm: 7942948
>> schedtm: 8852850
>> schedtm: 9931938
>> schedtm: 10936934
>> schedtm: 11951950
>> schedtm: total time(us):10138172
>> schedtm: run time(us):5055223(49.86%)
>> schedtm: wait time(us):5082949
>> schedtm: latency(us):10489
>> schedtm: stmt22 got highest run time 5604941(+10%)
>> schedtm: stmt3 got lowest run time 4852057(-4%)
>> schedtm: stmt12 got highest latency 11482(+9%)
>> schedtm: stmt0 got lowest latency 7561(-27%)
>>
>> And you can enable/disable CONFIG_PREEMPT to see magnificent
>> change on latency.
>>
>> This is nothing but a demo, and please "RUN IT ON A TEST MACHINE"...
>>
>> It will create 24 kernel threads and run 10 seconds, you can change
>> it by module param.
>>
>> I will be appreciate if I could get some feedback from the scheduler
>> experts like you, whatever you think it's good or junk, 

[PATCH] x86, doc: fix grammar and typo in boot.txt

2012-10-25 Thread Kees Cook
Fixes some minor issues in the x86 boot documentation.

Signed-off-by: Kees Cook 
---
 Documentation/x86/boot.txt |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9efceff..f15cb74 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -1013,7 +1013,7 @@ boot_params as that of 16-bit boot protocol, the boot 
loader should
 also fill the additional fields of the struct boot_params as that
 described in zero-page.txt.
 
-After setupping the struct boot_params, the boot loader can load the
+After setting up the struct boot_params, the boot loader can load the
 32/64-bit kernel in the same way as that of 16-bit boot protocol.
 
 In 32-bit boot protocol, the kernel is started by jumping to the
@@ -1023,7 +1023,7 @@ In 32-bit boot protocol, the kernel is started by jumping 
to the
 At entry, the CPU must be in 32-bit protected mode with paging
 disabled; a GDT must be loaded with the descriptors for selectors
 __BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
-segment; __BOOS_CS must have execute/read permission, and __BOOT_DS
+segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
 must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
 must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
 address of the struct boot_params; %ebp, %edi and %ebx must be zero.
-- 
1.7.9.5


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Remove CsrEventHandle and functions using it

2012-10-25 Thread SeongJae Park
Thank you for comment.
I will make and send patch again.

Thanks and Regards.
SeongJae Park.

On Fri, Oct 26, 2012 at 4:16 AM, Greg KH  wrote:
> On Thu, Oct 25, 2012 at 02:51:10PM +0900, SeongJae Park wrote:
>> Nobody use CsrEventHandle, Nobody call function using it as parameter.
>> So, remove it.
>>
>> Signed-off-by: SeongJae Park 
>> ---
>>  drivers/staging/csr/csr_framework_ext.h   |   61 
>> -
>>  drivers/staging/csr/csr_framework_ext_types.h |2 -
>>  2 files changed, 63 deletions(-)
>
> This patch applies with fuzz, which implies that you didn't make it
> against the latest tree.  Care to redo it, and the 2/2 patch, against
> either my staging-next branch of the staging.git tree, or the next
> linux-next release that will happen tomorrow, and resend them?
>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Ming Lei
On Fri, Oct 26, 2012 at 10:32 AM, Linus Torvalds
 wrote:
>
> Please just make "fw_path[0]" just be the pointer to fw_path_para[]
> (which sounds like the cleanest fix) and get rid of the negative 'i'
> and conditional entirely.

Yes, it should be the cleanest, I don't do it because I thought that might
have caused one compile warning('const char *' points to memory
without 'const', like below)

static char fw_path_para[256];
static const char *fw_path[] = {
fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
};

but in fact there isn't any warning with above change and it does work, still
don't know why? :-(

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 03:51 AM, Johannes Weiner wrote:

On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:

On x86 memory accesses to pages without the ACCESSED flag set result in the
ACCESSED flag being set automatically. With the ARM architecture a page access
fault is raised instead (and it will continue to be raised until the ACCESSED
flag is set for the appropriate PTE/PMD).

For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
be called for a write fault.

This patch ensures that faults on transparent hugepages which do not result
in a CoW update the access flags for the faulting pmd.

Cc: Chris Metcalf 
Cc: Kirill A. Shutemov 
Cc: Andrea Arcangeli 
Signed-off-by: Will Deacon 

Acked-by: Johannes Weiner 


Ok chaps, I rebased this thing onto today's next (which basically
necessitated a rewrite) so I've reluctantly dropped my acks and kindly
ask if you could eyeball the new code, especially where the locking is
concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
that the page is not splitting, but I can't see why that is required.

I don't either.  If the thing was splitting when the fault happened,
that path is not taken.  And the locked pmd_same() check should rule
out splitting setting in after testing pmd_trans_huge_splitting().


Why I can't find function pmd_trans_huge_splitting() you mentioned in 
latest mainline codes and linux-next?




Peter?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 4/6] Thermal: Remove the cooling_cpufreq_list

2012-10-25 Thread Hongbo Zhang
On 26 October 2012 03:14, Francesco Lavra  wrote:
> Hi,
> Hongbo Zhang wrote:
>> Problem of using this list is that the cpufreq_get_max_state callback will be
>> called when register cooling device by thermal_cooling_device_register, but
>> this list isn't ready at this moment. What's more, there is no need to 
>> maintain
>> such a list, we can get cpufreq_cooling_device instance by the private
>> thermal_cooling_device.devdata.
>>
>> Signed-off-by: hongbo.zhang 
>> ---
>>  drivers/thermal/cpu_cooling.c | 81 
>> +--
>>  1 file changed, 16 insertions(+), 65 deletions(-)
>>
>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>> index 415b041..cc80d29 100644
>> --- a/drivers/thermal/cpu_cooling.c
>> +++ b/drivers/thermal/cpu_cooling.c
>> @@ -58,8 +58,9 @@ struct cpufreq_cooling_device {
>>  };
>>  static LIST_HEAD(cooling_cpufreq_list);
>>  static DEFINE_IDR(cpufreq_idr);
>> +static DEFINE_MUTEX(cooling_cpufreq_lock);
>>
>> -static struct mutex cooling_cpufreq_lock;
>> +static unsigned int cpufreq_dev_count;
>>
>>  /* notify_table passes value to the CPUFREQ_ADJUST callback function. */
>>  #define NOTIFY_INVALID NULL
>> @@ -241,20 +242,12 @@ static int cpufreq_get_max_state(struct 
>> thermal_cooling_device *cdev,
>>unsigned long *state)
>>  {
>>   int ret = -EINVAL, i = 0;
>> - struct cpufreq_cooling_device *cpufreq_device;
>> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
>>   struct cpumask *maskPtr;
>>   unsigned int cpu;
>>   struct cpufreq_frequency_table *table;
>>   unsigned long count = 0;
>>
>> - mutex_lock(_cpufreq_lock);
>> - list_for_each_entry(cpufreq_device, _cpufreq_list, node) {
>> - if (cpufreq_device && cpufreq_device->cool_dev == cdev)
>> - break;
>> - }
>> - if (cpufreq_device == NULL)
>> - goto return_get_max_state;
>> -
>>   maskPtr = _device->allowed_cpus;
>>   cpu = cpumask_any(maskPtr);
>>   table = cpufreq_frequency_get_table(cpu);
>> @@ -276,7 +269,6 @@ static int cpufreq_get_max_state(struct 
>> thermal_cooling_device *cdev,
>>   }
>>
>>  return_get_max_state:
>> - mutex_unlock(_cpufreq_lock);
>>   return ret;
>
> Since there is no mutex locking/unlocking anymore, I'd say the goto
> label should be removed.
Good.
>
> [...]
>>  void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>>  {
>> - struct cpufreq_cooling_device *cpufreq_dev = NULL;
>> - unsigned int cpufreq_dev_count = 0;
>> + struct cpufreq_cooling_device *cpufreq_dev = cdev->devdata;
>>
>> - mutex_lock(_cpufreq_lock);
>> - list_for_each_entry(cpufreq_dev, _cpufreq_list, node) {
>> - if (cpufreq_dev && cpufreq_dev->cool_dev == cdev)
>> - break;
>> - cpufreq_dev_count++;
>> - }
>> -
>> - if (!cpufreq_dev || cpufreq_dev->cool_dev != cdev) {
>> - mutex_unlock(_cpufreq_lock);
>> - return;
>> - }
>> + thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
>>
>> - list_del(_dev->node);
>> + mutex_lock(_cpufreq_lock);
>> + cpufreq_dev_count--;
>>
>>   /* Unregister the notifier for the last cpufreq cooling device */
>> - if (cpufreq_dev_count == 1) {
>> + if (cpufreq_dev_count == 0) {
>>   cpufreq_unregister_notifier(_cpufreq_notifier_block,
>>   CPUFREQ_POLICY_NOTIFIER);
>>   }
>>   mutex_unlock(_cpufreq_lock);
>> - thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
>
> Why did you move the call to thermal_cooling_device_unregister() from
> here? I don't see any reason for moving it.
In common sense, usually unregister first and then count--;
But here it should be opposite sequence of cpufreq_cooling_register,
will update it.

>
>> +
>>   release_idr(_idr, cpufreq_dev->id);
>> - if (cpufreq_dev_count == 1)
>> - mutex_destroy(_cpufreq_lock);
>>   kfree(cpufreq_dev);
>>  }
>>  EXPORT_SYMBOL(cpufreq_cooling_unregister);
>> --
>> 1.7.11.3
>
> --
> Francesco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-25 Thread Linus Torvalds
On Thu, Oct 25, 2012 at 7:30 PM, Rik van Riel  wrote:
>>
>> LOOK at the code, for chrissake. Just look at it. And if you don't see
>> why the above is stupid and retarded, you damn well shouldn't be
>> touching VM code.
>
> I agree it is pretty ugly.  However, the above patch
> did get rid of a gigantic performance regression with
> Peter's code.

Rik, *LOOK* at the code like I asked you to, instead of making excuses for it.

I'm not necessarily arguing with what the code tries to do. I'm
arguing with the fact that the code is pure and utter *garbage*.

It has two major (and I mean *MAJOR*) problems, both of which
individually should make you ashamed for ever posting that piece of
shit:

The obvious-without-even-understanding-semantics problem:

 - it's humongously stupidly written. It calculates that
'flush_remote' flag WHETHER IT GETS USED OR NOT.

   Christ. I can kind of expect stuff like that in driver code etc,
but in VM routines?

   Yes, the compiler may be smart enough to actually fix up the
idiocy. That doesn't make it less stupid.

The more-subtle-but-fundamental-problem:

 - regardless of how stupidly written it is on a very superficial
level, it's even more stupid in a much more fundamental way.

   That whole routine is explicitly written to be opportunistic. It is
*documented* to only set the access flags, so comparing anything else
is stupid, wouldn't you say?

Documented where? It's actually explicitly documented in the
pgtable-generic.c file which has the generic implementation of that
thing. But it's implicitly documented both in the name of the function
(do take another look) *and* in the actual implementation of the
function.

Look at the code: it doesn't even always update the page tables AT ALL
(and no, the return value does *not* reflect whether it updated it or
not!)

Also, notice how we update the pte entry with a simple

*ptep = entry;

statement, not with the usual expensive page table updates? The only
thing that makes this safe is that we *only* do it with the exact same
page frame number (anything else would be disastrously buggy on 32-bit
PAE, for example). And we only ever do it with the dirty bit always
set, because otherwise we might be silently dropping a concurrent
hardware update of the dirty bit of the previous pte value on another
CPU.

The latter requirement is why the x86 code does

if (changed && dirty) {

while the generic code checks just "If (changed)" (and then uses the
much more expensive set_pte_at() that has the proper dirty-bit
guarantees, and generates atomic accesses, not to mention various
virtualization crap).

In other words, everything that was added by that patch is PURE AND
UTTER SHIT. And THAT is what I'm objecting to.

Guess what? If you want to optimize the function to not do remote TLB
flushes, then just do that! None of the garbage. Just change the

flush_tlb_page(vma, address);

line to

__flush_tlb_one(address);

and it should damn well work. Because everything I see about
"flush_remote" looks just wrong, wrong, wrong.

And if there really is some reason for that whole flush_remote
braindamage, then we have much bigger problems, namely the fact that
we've broken the documented semantics of that function, and we're
doing various other things that are completely and utterly invalid
unless the above semantics hold.

So that patch should be burned, and possibly used as an example of
horribly crappy code for later generations. At no point should it be
applied.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-25 Thread he, bo
From: Yanmin Zhang 

We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != 
HRTIMER_STATE_CALLBACK).
<2>[   10.226053, 3] kernel BUG at 
/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
<0>[   10.235682, 3] invalid opcode:  [#1] PREEMPT SMP
<4>[   10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 
compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
<4>[   10.251651, 3]
<4>[   10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC  
3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
<4>[   10.264674, 3] EIP: 0060:[] EFLAGS: 00010002 CPU: 3
<4>[   10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
<4>[   10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301
<4>[   10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
<4>[   10.288113, 3]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
<0>[   10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 
task.ti=f57b4000)
<0>[   10.301827, 3] Stack:
<4>[   10.304083, 3]   c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 
f67fb6b8 f57b5e14
<4>[   10.312069, 3]  0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 
f57b5e08 c126f325
<4>[   10.320055, 3]   86b9868d 0001 86b9868d 0001 0003 
 7fff
<0>[   10.328041, 3] Call Trace:
<4>[   10.330742, 3]  [] ? gburst_thread_stop.isra.25+0x40/0x40
<4>[   10.336988, 3]  [] hrtimer_interrupt+0xd5/0x250
<4>[   10.342368, 3]  [] ? sched_clock_cpu+0xe5/0x150
<4>[   10.347753, 3]  [] smp_apic_timer_interrupt+0x54/0x88
<4>[   10.353654, 3]  [] ? trace_hardirqs_off_thunk+0xc/0x14
<4>[   10.359643, 3]  [] apic_timer_interrupt+0x2f/0x34
<4>[   10.365199, 3]  [] ? sub_preempt_count+0x1f/0x50
<4>[   10.370669, 3]  [] delay_tsc+0x3a/0xc0
<6>[   10.371589, 0] android_work: did not send uevent (0 0   (null))
<4>[   10.381171, 3]  [] __const_udelay+0x23/0x30
<4>[   10.386207, 3]  [] mdfld_dsi_send_dcs+0x12a/0x5d0
<4>[   10.391760, 3]  [] ? _raw_spin_unlock_irqrestore+0x26/0x50
<4>[   10.398101, 3]  [] ? ospm_power_using_hw_begin+0xa1/0x350
<4>[   10.399053, 3]  [] ? __mutex_lock_slowpath+0x1ff/0x2f0
<4>[   10.399069, 3]  [] mdfld_dbi_update_panel+0x21e/0x2d0
<4>[   10.399085, 3]  [] mdfld_te_handler_work+0x71/0x80
<4>[   10.399099, 3]  [] process_one_work+0xfe/0x3f0
<4>[   10.399114, 3]  [] ? mdfld_async_flip_te_handler+0xf0/0xf0

Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer 
calls
the timer callback fn, another thread might call enqueue_hrtimer or 
hrtimer_start
to requeue it, and the timer->state is equal to 
HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking fails.

The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.

Signed-off-by: Yanmin Zhang 
Reviewed-by: He, Bo 
---
 kernel/hrtimer.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..6280184 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t 
*now)
 * hrtimer_start_range_ns() or in hrtimer_interrupt()
 */
if (restart != HRTIMER_NORESTART) {
-   BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
+   BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
enqueue_hrtimer(timer, base);
}
 
-- 
1.7.6



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader

2012-10-25 Thread wwang

于 2012年10月26日 10:45, Greg KH 写道:

On Fri, Oct 26, 2012 at 09:10:45AM +0800, wwang wrote:

于 2012年10月26日 02:50, Greg KH 写道:

On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote:

On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote:

We are still maintaining the SCSI driver for Realtek card reader,
and will release the latest source code in the Github in the future.
But maybe we won't push it to the staging tree any more.

Maybe we should just remove the staging code if it won't be fixed.
That's sort of the point of staging.

I agree.  wwang, want me to delet the staging driver now?  I don't want
"dead" code in the tree, especially as you don't want to maintain it
anymore.

thanks,

greg k-h

Hi Greg:

OK. You can delete it now, please.
And I will push my new driver to MFD and MMC subsystem.

Ok, just to confirm, you want me to delete the
drivers/staging/rts_pstor/* code, right?

thanks,

greg k-h

Hi Greg:

Yes, you can delete drivers/staging/rts_pstor

BR,
wwang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader

2012-10-25 Thread Greg KH
On Fri, Oct 26, 2012 at 09:10:45AM +0800, wwang wrote:
> 于 2012年10月26日 02:50, Greg KH 写道:
> > On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote:
> >> On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote:
> >>> We are still maintaining the SCSI driver for Realtek card reader,
> >>> and will release the latest source code in the Github in the future.
> >>> But maybe we won't push it to the staging tree any more.
> >> Maybe we should just remove the staging code if it won't be fixed.
> >> That's sort of the point of staging.
> > I agree.  wwang, want me to delet the staging driver now?  I don't want
> > "dead" code in the tree, especially as you don't want to maintain it
> > anymore.
> >
> > thanks,
> >
> > greg k-h
> Hi Greg:
> 
> OK. You can delete it now, please.
> And I will push my new driver to MFD and MMC subsystem.

Ok, just to confirm, you want me to delete the
drivers/staging/rts_pstor/* code, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: process hangs on do_exit when oom happens

2012-10-25 Thread Qiang Gao
On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko  wrote:
> On Wed 24-10-12 11:44:17, Qiang Gao wrote:
>> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh  wrote:
>> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko  wrote:
>> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko  wrote:
>> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> >>> >> This process was moved to RT-priority queue when global oom-killer
>> >>> >> happened to boost the recovery of the system..
>> >>> >
>> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>> >>> > AFAIK.
>> >>> >
>> >>> >> but it wasn't get properily dealt with. I still have no idea why where
>> >>> >> the problem is ..
>> >>> >
>> >>> > Well your configuration says that there is no runtime reserved for the
>> >>> > group.
>> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>> >>> > information.
>> >>> >
>> >> [...]
>> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>> >>> would boost the process to RT prio when the process was selected
>> >>> by oom-killer.
>> >>
>> >> This still looks like your cpu controller is misconfigured. Even if the
>> >> task is promoted to be realtime.
>> >
>> >
>> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
>> > as a workaround please give the groups some RT bandwidth and then work
>> > out the migration to RT and what should be the defaults on the distro.
>> >
>> > Balbir
>>
>>
>> see https://patchwork.kernel.org/patch/719411/
>
> The patch surely "fixes" your problem but the primary fault here is the
> mis-configured cpu cgroup. If the value for the bandwidth is zero by
> default then all realtime processes in the group a screwed. The value
> should be set to something more reasonable.
> I am not familiar with the cpu controller but it seems that
> alloc_rt_sched_group needs some treat. Care to look into it and send a
> patch to the cpu controller and cgroup maintainers, please?
>
> --
> Michal Hocko
> SUSE Labs

I'm trying to fix the problem. but no substantive progress yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kdump with signed images

2012-10-25 Thread Matthew Garrett
On Thu, Oct 25, 2012 at 09:15:58PM -0400, Mimi Zohar wrote:

> On a running system, the package installer, after verifying the package
> integrity, would install each file with the associated 'security.ima'
> extended attribute.  The 'security.evm' digital signature would be
> installed with an HMAC, calculated using a system unique key. 

The idea isn't to prevent /sbin/kexec from being modified after 
installation - it's to prevent it from being possible to install a 
system that has a modified /sbin/kexec. Leaving any part of this up to 
the package installer means that it doesn't solve the problem we're 
trying to solve here. It must be impossible for the kernel to launch any 
/sbin/kexec that hasn't been signed by a trusted key that's been built 
into the kernel, and it must be impossible for anything other than 
/sbin/kexec to make the kexec system call.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/13] sched: Integrating Per-entity-load-tracking with the core scheduler

2012-10-25 Thread Preeti Murthy
The benchmark:

/*
 * test.c - Simulate workloads that load the CPU differently
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; version 2 of the License.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
 * USA
 */

/*
 * This workload spawns threads which request for allocation of a
 * memory chunk,write to it and free it.The duty cycle of these threads
 * can be varied.The idea is to simulate tasks which load the cpu
 * to different extents.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "malloc.h"

/* Variable entities */
static unsigned int seconds;
static unsigned int threads;
static unsigned int mem_chunk_size;
static unsigned int sleep_at;
static unsigned int sleep_interval;

typedef size_t mem_slot_t;/* 8 bytes */
static unsigned int slot_size = sizeof(mem_slot_t);

/* Other parameters */
static volatile int start;
static time_t start_time;
static unsigned int records_read;
pthread_mutex_t records_count_lock = PTHREAD_MUTEX_INITIALIZER;


static unsigned int write_to_mem(void)
{
int i, j;
mem_slot_t *scratch_pad, *temp;
mem_chunk_size = slot_size * 256;
mem_slot_t *end;

/* The below two parameters ensure that it is 10% workload
 * with a duty cycle of 10ms.The number of records read in
 * 1s without sleep was observed and appropriately calculated
 * for 1ms.This number turned out to be 1228.
 */
sleep_at = 1228; /* sleep for every 1228 records */
sleep_interval = 9000; /* sleep for 9 ms */

for (i=0; start == 1; i++)
{
scratch_pad = (mem_slot_t *)malloc(mem_chunk_size);
if (scratch_pad == NULL) {
fprintf(stderr,"Could not allocate memory\n");
exit(1);
}
end = scratch_pad + (mem_chunk_size / slot_size);
for (temp = scratch_pad, j=0; temp < end; temp++, j++)
*temp = (mem_slot_t)j;

free(scratch_pad);

if (sleep_at && !(i % sleep_at))
usleep(sleep_interval);
}
return (i);
}

static void *
thread_run(void *arg)
{

unsigned int records_local;

/* Wait for the start signal */

while (start == 0);

records_local = write_to_mem();

pthread_mutex_lock(_count_lock);
records_read += records_local;
pthread_mutex_unlock(_count_lock);

return NULL;
}

static void start_threads()
{
double diff_time;
unsigned int i;
int err;
threads = 8;
seconds = 10;

pthread_t thread_array[threads];
for (i = 0; i < threads; i++) {
err = pthread_create(_array[i], NULL, thread_run, NULL);
if (err) {
fprintf(stderr, "Error creating thread %d\n", i);
exit(1);
}
}
start_time = time(NULL);
start = 1;
sleep(seconds);
start = 0;
diff_time = difftime(time(NULL), start_time);

for (i = 0; i < threads; i++) {
err = pthread_join(thread_array[i], NULL);
if (err) {
fprintf(stderr, "Error joining thread %d\n", i);
exit(1);
}
}
 printf("%u records/s\n",
(unsigned int) (((double) records_read)/diff_time));

}
int main()
{
start_threads();
return 0;
}

Regards
Preeti U Murthy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Linus Torvalds
On Thu, Oct 25, 2012 at 5:46 PM, Ming Lei  wrote:
> struct file *file;
> -   snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
> +
> +   if (i < 0) {
> +   if (!fw_path_para[0])   /* No customized path */
> +   continue;
> +   snprintf(path, PATH_MAX, "%s/%s", fw_path_para,
> +buf->fw_id);
> +   } else {
> +   snprintf(path, PATH_MAX, "%s/%s", fw_path[i],
> +buf->fw_id);
> +   }

Ugh. This is just disgusting.

Please just make "fw_path[0]" just be the pointer to fw_path_para[]
(which sounds like the cleanest fix) and get rid of the negative 'i'
and conditional entirely.

Or if there is some odd reason you don't want to do that, at least
make the conditional much smaller, without the snprintf() in both arms
(ie make the if-statement just set a "const char *dir" variable to
either fw_path[i] or fw_path_para or whatever).

Sure, the compiler *may* merge them (gcc does, but I've seen it miss
them too), but even if the compiler might fix up ugly code, that's not
a reason for it to be ugly in the source code anyway.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page

2012-10-25 Thread Minchan Kim
On Thu, Oct 25, 2012 at 02:08:14AM -0700, Anton Vorontsov wrote:
> Hello Minchan,
> 
> Thanks a lot for the email!
> 
> On Thu, Oct 25, 2012 at 03:40:09PM +0900, Minchan Kim wrote:
> [...]
> > > What applications (well, activity managers) are really interested in is
> > > this:
> > > 
> > > 1. Do we we sacrifice resources for new memory allocations (e.g. files
> > >cache)?
> > > 2. Does the new memory allocations' cost becomes too high, and the system
> > >hurts because of this?
> > > 3. Are we about to OOM soon?
> > 
> > Good but I think 3 is never easy.
> > But early notification would be better than late notification which can kill
> > someone.
> 
> Well, basically these are two fixed (strictly defined) levels (low and
> oom) + one flexible level (med), which meaning can be slightly tuned (but
> we still have a meaningful definition for it).
> 

I mean detection of "3) Are we about to OOM soon" isn't easy.

> So, I guess it's a good start. :)

Absolutely!

> 
> > > And here are the answers:
> > > 
> > > 1. VMEVENT_PRESSURE_LOW
> > > 2. VMEVENT_PRESSURE_MED
> > > 3. VMEVENT_PRESSURE_OOM
> > > 
> > > There is no "high" pressure, since I really don't see any definition of
> > > it, but it's possible to introduce new levels without breaking ABI. The
> > > levels described in more details in the patches, and the stuff is still
> > > tunable, but now via sysctls, not the vmevent_fd() call itself (i.e. we
> > > don't need to rebuild applications to adjust window size or other mm
> > > "details").
> > > 
> > > What I couldn't fix in this RFC is making vmevent_{scanned,reclaimed}
> > > stuff per-CPU (there's a comment describing the problem with this). But I
> > > made it lockless and tried to make it very lightweight (plus I moved the
> > > vmevent_pressure() call to a more "cold" path).
> > 
> > Your description doesn't include why we need new vmevent_fd(2).
> > Of course, it's very flexible and potential to add new VM knob easily but
> > the thing we is about to use now is only VMEVENT_ATTR_PRESSURE.
> > Is there any other use cases for swap or free? or potential user?
> 
> Number of idle pages by itself might be not that interesting, but
> cache+idle level is quite interesting.
> 
> By definition, _MED happens when performance already degraded, slightly,
> but still -- we can be swapping.
> 
> But _LOW notifications are coming when kernel is just reclaiming, so by
> using _LOW notifications + watching for cache level we can very easily
> predict the swapping activity long before we have even _MED pressure.

So, for seeing cache level, we need new vmevent_attr?

> 
> E.g. if idle+cache drops below amount of memory that userland can free,
> we'd indeed like to start freeing stuff (this somewhat resembles current
> logic that we have in the in-kernel LMK).
> 
> Sure, we can read and parse /proc/vmstat upon _LOW events (and that was my
> backup plan), but reporting stuff together would make things much nicer.

My concern is that user can imagine various scenario with vmstat and they might 
start to
require new vmevent_attr in future and vmevent_fd will be bloated and mm guys 
should
care of vmevent_vd whenever they add new vmstat. I don't like it. User can do 
it by
just reading /proc/vmstat. So I support your backup plan.


> 
> Although, I somewhat doubt that it is OK to report raw numbers, so this
> needs some thinking to develop more elegant solution.

Indeed.

> 
> Maybe it makes sense to implement something like PRESSURE_MILD with an
> additional nr_pages threshold, which basically hits the kernel about how
> many easily reclaimable pages userland has (that would be a part of our
> definition for the mild pressure level). So, essentially it will be
> 
>   if (pressure_index >= oom_level)
>   return PRESSURE_OOM;
>   else if (pressure_index >= med_level)
>   return PRESSURE_MEDIUM;
>   else if (userland_reclaimable_pages >= nr_reclaimable_pages)
>   return PRESSURE_MILD;
>   return PRESSURE_LOW;
> 
> I must admit I like the idea more than exposing NR_FREE and stuff, but the
> scheme reminds me the blended attributes, which we abandoned. Although,
> the definition sounds better now, and we seem to be doing it in the right
> place.
> 
> And if we go this way, then sure, we won't need any other attributes, and
> so we could make the API much simpler.

That's what I want! If there isn't any user who really are willing to use it,
let's drop it. Do not persuade with imaginary scenario because we should be 
careful to introduce new ABI.

> 
> > Adding vmevent_fd without them is rather overkill.
> > 
> > And I want to avoid timer-base polling of vmevent if possbile.
> > mem_notify of KOSAKI doesn't use such timer.
> 
> For pressure notifications we don't use the timers. We also read the

Hmm, when I see the code, timer still works and can notify to user. No?

> vmstat counters together with the pressure, so "pressure + counters"
> effectively turns it 

Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 09:27 AM, Fengguang Wu wrote:

On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote:

On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:

Hi Chen,


But how can bdi related ra_pages reflect different files' readahead
window? Maybe these different files are sequential read, random read
and so on.

It's simple: sequential reads will get ra_pages readahead size while
random reads will not get readahead at all.

Talking about the below chunk, it might hurt someone that explicitly
takes advantage of the behavior, however the ra_pages*2 seems more
like a hack than general solution to me: if the user will need
POSIX_FADV_SEQUENTIAL to double the max readahead window size for
improving IO performance, then why not just increase bdi->ra_pages and
benefit all reads? One may argue that it offers some differential
behavior to specific applications, however it may also present as a
counter-optimization: if the root already tuned bdi->ra_pages to the
optimal size, the doubled readahead size will only cost more memory
and perhaps IO latency.

--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t 
len, int advice)
 spin_unlock(>f_lock);
 break;
 case POSIX_FADV_SEQUENTIAL:
-   file->f_ra.ra_pages = bdi->ra_pages * 2;

I think we really have to reset file->f_ra.ra_pages here as it is
not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
reduce ra_pages as a result of IO errors. Hence if you have had io
errors, telling the kernel that you are now going to do  sequential
IO should reset the readahead to the maximum ra_pages value
supported

Good point!

 but wait  this patch removes file->f_ra.ra_pages in all other
places too, so there will be no file->f_ra.ra_pages to be reset here...


In his patch,

 static void shrink_readahead_size_eio(struct file *filp,
struct file_ra_state *ra)
 {
-   ra->ra_pages /= 4;
+   spin_lock(>f_lock);
+   filp->f_mode |= FMODE_RANDOM;
+   spin_unlock(>f_lock);

As the example in comment above this function, the read maybe still 
sequential, and it will waste IO bandwith if modify to FMODE_RANDOM 
directly.




Thanks,
Fengguang



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-25 Thread Rik van Riel

On 10/25/2012 04:17 PM, Linus Torvalds wrote:

On Thu, Oct 25, 2012 at 5:16 AM, Peter Zijlstra  wrote:

From: Rik van Riel 

@@ -306,11 +306,26 @@ int ptep_set_access_flags(struct vm_area
   pte_t entry, int dirty)
  {
 int changed = !pte_same(*ptep, entry);
+   /*
+* If the page used to be inaccessible (_PAGE_PROTNONE), or
+* this call upgrades the access permissions on the same page,
+* it is safe to skip the remote TLB flush.
+*/
+   bool flush_remote = false;
+   if (!pte_accessible(*ptep))
+   flush_remote = false;
+   else if (pte_pfn(*ptep) != pte_pfn(entry) ||
+   (pte_write(*ptep) && !pte_write(entry)) ||
+   (pte_exec(*ptep) && !pte_exec(entry)))
+   flush_remote = true;

 if (changed && dirty) {


Did anybody ever actually look at this sh*t-for-brains patch?

Yeah, I'm grumpy. But I'm wasting time looking at patches that have
new code in them that is stupid and retarded.

This is the VM, guys, we don't add stupid and retarded code.

LOOK at the code, for chrissake. Just look at it. And if you don't see
why the above is stupid and retarded, you damn well shouldn't be
touching VM code.


I agree it is pretty ugly.  However, the above patch
did get rid of a gigantic performance regression with
Peter's code.

Doing unnecessary remote TLB flushes was costing about
90% performance with specjbb on a 4 node system.

However, if we can guarantee that ptep_set_access_flags
is only ever called for pte permission _upgrades_, we
can simply get rid of the remote TLB flush on x86, and
skip the paranoia tests we are doing above.

Do we have that kind of guarantee?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: The idea about scheduler test module(STM)

2012-10-25 Thread Charles Wang

Yes, it's a new way to do scheduler test. But why use kernel
 threads? The info, total time, run time, wait time, preempt number, 
all can be collected from tasks' sched info from /proc/pid/sched and

/proc/pid/stat.

I don't understand clearly about "pure scheduler performance" here. In
order to test scheduler fully, we need to do IO test, and other 
subsystems will be involved in. But if other subsystems' code are not 
changed, the result still can refer to scheduler's change.


I can't think much farther about the advantage this way could give. 
Maybe you should show us ur better examples. :)


Regards,
Charles

On 10/25/2012 01:40 PM, Michael Wang wrote:

Hi, Folks

Charles has raised a problem that we don't have any tool yet
for testing the scheduler with out any disturb from other
subsystem, and I also found it's hard to test scheduler optimize
patch, since the improvement could be easily eaten by other
subsystem like IO.

So Let's check the tools we have currently:
1. perf sched

we can use it to trace the threads we interested, and
the info it provided is very good, but one issue is,
it could not create the workload we want, also collect
the info and do summary is not so easy.

2. linsched

It's a very good tool to create the test environment,
but it's implementation is to ideal, so it could not
present the real world problem.

Since both perf and linsched could not meet our requirement, we
decided to develop a new tool, let's currently call it
scheduler test module(STM).

It's propose is:
1. create the workload we want.
2. test the pure scheduler.
3. collect info we need and do summary.

This tool should be very easy to use and not depends on
the implementation of scheduler.

We can use it to check the pure scheduler performance on
our system.

We can use it to check whether there are regression in
scheduler when testing patches.

And other usage I've not figure out yet.

In order to explain the idea more directly, I have wrote a
prototype STM, it's a separate module, and you can use it
just like 'rcutorture'.

I attached a small script 'play.sh' to help you easily
run the test, put 'schedtm.c' and 'play.sh' in same directory
and run 'play.sh', you will see out put like:

schedtm: summary
schedtm:cpu count:  cpu run preempt
schedtm:0   13811381
schedtm:1   957 955
schedtm:2   900 900
schedtm:3   10351034
schedtm:4   991 990
schedtm:5   940 939
schedtm:6   900 897
schedtm:7   942 948
schedtm:8   852 850
schedtm:9   931 938
schedtm:10  936 934
schedtm:11  951 950
schedtm:total time(us): 10138172
schedtm:run time(us):   5055223(49.86%)
schedtm:wait time(us):  5082949
schedtm:latency(us):10489
schedtm: stmt22 got highest run time 5604941(+10%)
schedtm: stmt3 got lowest run time 4852057(-4%)
schedtm: stmt12 got highest latency 11482(+9%)
schedtm: stmt0 got lowest latency 7561(-27%)

And you can enable/disable CONFIG_PREEMPT to see magnificent
change on latency.

This is nothing but a demo, and please "RUN IT ON A TEST MACHINE"...

It will create 24 kernel threads and run 10 seconds, you can change
it by module param.

I will be appreciate if I could get some feedback from the scheduler
experts like you, whatever you think it's good or junk, please let
me know :)

Regards,
Michael Wang



play.sh:

DURATION=10
NORMAL_THREADS=24
PERIOD=10

make clean
make
insmod ./schedtm.ko normalnr=$NORMAL_THREADS period=$PERIOD
sleep $DURATION
rmmod ./schedtm.ko
dmesg | grep schedtm

schedtm.c:

/*
  * scheduler test module
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  * Copyright (C) IBM Corporation, 2012
  *
  * Authors: Michael Wang 
  *
  */
#include 
#include 
#include 

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Michael Wang ");

#define pr_schedtm(fmt, ...)\
do {\

Re: [ 08/31] use clamp_t in UNAME26 fix

2012-10-25 Thread Greg Kroah-Hartman
On Thu, Oct 25, 2012 at 05:11:19PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Greg Kroah-Hartman wrote:
> 
> > commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream.
> >
> > The min/max call needed to have explicit types on some architectures
> > (e.g. mn10300). Use clamp_t instead to avoid the warning:
> >
> >   kernel/sys.c: In function 'override_release':
> >   kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks 
> > a cast [enabled by default]
> >
> > Reported-by: Fengguang Wu 
> > Signed-off-by: Kees Cook 
> > Signed-off-by: Linus Torvalds 
> > Signed-off-by: Greg Kroah-Hartman 
> [...]
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -1152,7 +1152,7 @@ static int override_release(char __user
> > rest++;
> > }
> > v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40;
> > -   copy = min(sizeof(buf), max_t(size_t, 1, len));
> > +   copy = clamp_t(size_t, len, 1, sizeof(buf));
> > copy = scnprintf(buf, copy, "2.6.%u%s", v, rest);
> 
> Does this have any effect at runtime?  If not, why is it needed for
> stable kernels?

It's a bugfix for the previous patch in this area, fixing the build
warning.  I don't like adding stable patches that add new warnings :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 08:25 AM, Dave Chinner wrote:

On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:

Hi Chen,


But how can bdi related ra_pages reflect different files' readahead
window? Maybe these different files are sequential read, random read
and so on.

It's simple: sequential reads will get ra_pages readahead size while
random reads will not get readahead at all.

Talking about the below chunk, it might hurt someone that explicitly
takes advantage of the behavior, however the ra_pages*2 seems more
like a hack than general solution to me: if the user will need
POSIX_FADV_SEQUENTIAL to double the max readahead window size for
improving IO performance, then why not just increase bdi->ra_pages and
benefit all reads? One may argue that it offers some differential
behavior to specific applications, however it may also present as a
counter-optimization: if the root already tuned bdi->ra_pages to the
optimal size, the doubled readahead size will only cost more memory
and perhaps IO latency.

--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t 
len, int advice)
 spin_unlock(>f_lock);
 break;
 case POSIX_FADV_SEQUENTIAL:
-   file->f_ra.ra_pages = bdi->ra_pages * 2;

I think we really have to reset file->f_ra.ra_pages here as it is
not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
reduce ra_pages as a result of IO errors. Hence if you have had io
errors, telling the kernel that you are now going to do  sequential
IO should reset the readahead to the maximum ra_pages value
supported


Good catch!



Cheers,

Dave.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 05:48 AM, Hugh Dickins wrote:

On Thu, 25 Oct 2012, Johannes Weiner wrote:

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:

On Wed, 24 Oct 2012, Dave Jones wrote:


Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49

1148 error = shmem_add_to_page_cache(page, mapping, 
index,
1149 gfp, 
swp_to_radix_entry(swap));
1150 /* We already confirmed swap, and make no 
allocation */
1151 VM_BUG_ON(error);
1152 }

That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Could it be concurrent truncation clearing out the entry between
shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
anything preventing that.

The empty slot would not match the expected swap entry this call
passes in and the returned error would be -ENOENT.

Excellent notion, many thanks Hannes, I believe you've got it.

I've hit that truncation problem in swapoff (and commented on it
in shmem_unuse_inode), but never hit it or considered it here.
I think of the page lock as holding it stable, but truncation's
free_swap_and_cache only does a trylock on the swapcache page,
so we're not secured against that possibility.


Hi Hugh,

Even though free_swap_and_cache only does a trylock on the swapcache 
page, but it doens't call delete_from_swap_cache and the associated 
entry should still be there, I am interested in what you have already 
introduce to protect it?




So I'd like to change it to VM_BUG_ON(error && error != -ENOENT),
but there's a little tidying up to do in the -ENOENT case, which


Do you mean radix_tree_insert will return -ENOENT if the associated 
entry is not present? Why I can't find this return value in the function 
radix_tree_insert?



needs more thought.  A delete_from_swap_cache(page) - though we
can be lazy and leave that to reclaim for such a rare occurrence -
and probably a mem_cgroup uncharge; but the memcg hooks are always
the hardest to get right, I'll have think about that one carefully.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]

2012-10-25 Thread Lai Jiangshan
On 10/25/2012 12:17 PM, KOSAKI Motohiro wrote:
> On Wed, Oct 24, 2012 at 5:43 AM, Lai Jiangshan  wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause
>> node_states[N_NORMAL_MEMORY] becomes incorrect.
>>
>> Example, if a node is empty before online, and we online a memory
>> which is in ZONE_NORMAL. And after online,  node_states[N_HIGH_MEMORY]
>> is correct, but node_states[N_NORMAL_MEMORY] is incorrect,
>> the online code don't set the new online node to
>> node_states[N_NORMAL_MEMORY].
>>
>> The same things like it will happen when offline(the offline code
>> don't clear the node from node_states[N_NORMAL_MEMORY] when needed).
>> Some memory managment code depends node_states[N_NORMAL_MEMORY],
>> so we have to fix up the node_states[N_NORMAL_MEMORY].
>>
>> We add node_states_check_changes_online() and 
>> node_states_check_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
>>
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed. (We can add a @flags and reuse @status_change_nid instead of
>> introducing @status_change_nid_normal, but it will add much more complicated
>> in memory hotplug callback in every subsystem. So introdcing
>> @status_change_nid_normal is better and it don't change the sematic
>> of @status_change_nid)
>>
>> Changed from V1:
>> add more comments
>> change the function name
> 
> Your patch didn't fix my previous comments and don't works correctly.
> Please test your own patch before resubmitting. You should consider both
> zone normal only node and zone high only node.
> 

The comments in the code already answered/explained your previous comments.

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kfifo: remove unnecessary type check

2012-10-25 Thread Yuanhan Liu
From: Yuanhan Liu 

Firstly, this kind of type check doesn't work. It does something similay
like following:
void * __dummy = NULL;
__buf = __dummy;

__dummy is defined as void *. Thus it will not trigger warnings as
expected.

Second, we don't need that kind of check. Since the prototype
of __kfifo_out is:
unsigned int __kfifo_out(struct __kfifo *fifo,  void *buf, unsigned int 
len)

buf is defined as void *, so we don't need do the type check. Remove it.

LINK: https://lkml.org/lkml/2012/10/25/386
LINK: https://lkml.org/lkml/2012/10/25/584

Cc: Andrew Morton 
Cc: Wei Yang 
Cc: Stefani Seibold 
Cc: Fengguang Wu 
Cc: Stephen Rothwell 
Signed-off-by: Yuanhan Liu 
---
 include/linux/kfifo.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h
index 10308c6..b8c1d03 100644
--- a/include/linux/kfifo.h
+++ b/include/linux/kfifo.h
@@ -390,10 +390,6 @@ __kfifo_int_must_check_helper( \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) { \
-   typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \
-   __dummy = (typeof(__val))NULL; \
-   } \
if (__recsize) \
__ret = __kfifo_in_r(__kfifo, __val, sizeof(*__val), \
__recsize); \
@@ -432,8 +428,6 @@ __kfifo_uint_must_check_helper( \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) \
-   __val = (typeof(__tmp->ptr))0; \
if (__recsize) \
__ret = __kfifo_out_r(__kfifo, __val, sizeof(*__val), \
__recsize); \
@@ -473,8 +467,6 @@ __kfifo_uint_must_check_helper( \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) \
-   __val = (typeof(__tmp->ptr))NULL; \
if (__recsize) \
__ret = __kfifo_out_peek_r(__kfifo, __val, sizeof(*__val), \
__recsize); \
@@ -512,10 +504,6 @@ __kfifo_uint_must_check_helper( \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) { \
-   typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \
-   __dummy = (typeof(__buf))NULL; \
-   } \
(__recsize) ?\
__kfifo_in_r(__kfifo, __buf, __n, __recsize) : \
__kfifo_in(__kfifo, __buf, __n); \
@@ -565,10 +553,6 @@ __kfifo_uint_must_check_helper( \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) { \
-   typeof(__tmp->ptr) __dummy = NULL; \
-   __buf = __dummy; \
-   } \
(__recsize) ?\
__kfifo_out_r(__kfifo, __buf, __n, __recsize) : \
__kfifo_out(__kfifo, __buf, __n); \
@@ -777,10 +761,6 @@ __kfifo_uint_must_check_helper( \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
-   if (0) { \
-   typeof(__tmp->ptr) __dummy __attribute__ ((unused)) = NULL; \
-   __buf = __dummy; \
-   } \
(__recsize) ? \
__kfifo_out_peek_r(__kfifo, __buf, __n, __recsize) : \
__kfifo_out_peek(__kfifo, __buf, __n); \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]

2012-10-25 Thread Ni zhan Chen

On 10/26/2012 05:27 AM, Hugh Dickins wrote:

On Thu, 25 Oct 2012, Ni zhan Chen wrote:

On 10/25/2012 02:59 PM, Hugh Dickins wrote:

On Thu, 25 Oct 2012, Ni zhan Chen wrote:

I think it maybe caused by your commit [d189922862e03ce: shmem: fix
negative
rss in memcg memory.stat], one question:

Well, yes, I added the VM_BUG_ON in that commit.


if function shmem_confirm_swap confirm the entry has already brought back
from swap by a racing thread,

The reverse: true confirms that the swap entry has not been brought back
from swap by a racing thread; false indicates that there has been a race.


then why call shmem_add_to_page_cache to add
page from swapcache to pagecache again?

Adding it to pagecache again, after such a race, would set error to
-EEXIST (originating from radix_tree_insert); but we don't do that,
we add it to pagecache when it has not already been added.

Or that's the intention: but Dave seems to have found an unexpected
exception, despite us holding the page lock across all this.

(But if it weren't for the memcg and replace_page issues, I'd much
prefer to let shmem_add_to_page_cache discover the race as before.)

Hugh

Hi Hugh

Thanks for your response. You mean the -EEXIST originating from
radix_tree_insert, in radix_tree_insert:
if (slot != NULL)
 return -EEXIST;
But why slot should be NULL? if no race, the pagecache related radix tree
entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?

I was describing what would happen in a case that should not exist,
that you had thought the common case.  In actuality, the entry should
not be NULL, it should be as you say there.


Thanks for your patience. So in the common case, the entry should be the 
value I mentioned, then why has this check?

if (slot != NULL)
return -EEXIST;

the common case will return -EEXIST.



Hugh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] dm: stay in blk_queue_bypass until queue becomes initialized

2012-10-25 Thread Jun'ichi Nomura
On 10/25/12 18:41, Jun'ichi Nomura wrote:
> With 749fefe677 ("block: lift the initial queue bypass mode on
> blk_register_queue() instead of blk_init_allocated_queue()"),
> add_disk() eventually calls blk_queue_bypass_end().
> This change invokes the following warning when multipath is used.
...
> The warning means during queue initialization blk_queue_bypass_start()
> calls sleeping function (synchronize_rcu) while dm holds md->type_lock.
> 
> dm device initialization basically includes the following 3 steps:
>   1. create ioctl, allocates queue and call add_disk()
>   2. table load ioctl, determines device type and initialize queue
>  if request-based
>   3. resume ioctl, device becomes functional
> 
> So it is better to have dm's queue stay in bypass mode until
> the initialization completes in table load ioctl.
> 
> The effect of additional blk_queue_bypass_start():
> 
>   3.7-rc2 (plain)
> # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \
> dmsetup remove a; done
> 
> real  0m15.434s
> user  0m0.423s
> sys   0m7.052s
> 
>   3.7-rc2 (with this patch)
> # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \
> dmsetup remove a; done
> real  0m19.766s
> user  0m0.442s
> sys   0m6.861s
> 
> If this additional cost is not negligible, we need a variant of add_disk()
> that does not end bypassing.

Or call blk_queue_bypass_start() before add_disk():

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ad02761..d14639b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1868,9 +1868,9 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->queue = md->queue;
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
-   add_disk(md->disk);
/* Until md type is determined, put the queue in bypass mode */
blk_queue_bypass_start(md->queue);
+   add_disk(md->disk);
format_dev_t(md->name, MKDEV(_major, minor));
 
md->wq = alloc_workqueue("kdmflush",
---

If the patch is modified like above, we could fix the issue
without incurring additional cost on dm device creation.

# time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \
dmsetup remove a; done

real0m15.684s
user0m0.404s
sys 0m7.181s

-- 
Jun'ichi Nomura, NEC Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.

2012-10-25 Thread Mukesh Rathor
On Thu, 25 Oct 2012 08:46:59 +0100
Ian Campbell  wrote:

> On Thu, 2012-10-25 at 01:07 +0100, Mukesh Rathor wrote:
> > On Wed, 24 Oct 2012 16:44:11 -0700
> > Mukesh Rathor  wrote:
> > 
> > > >  
> > > > +/* Indexes into space being mapped. */
> > > > +GUEST_HANDLE(xen_ulong_t) idxs;
> > > > +
> > > > +/* GPFN in domid where the source mapping page should
> > > > appear. */
> > > > +GUEST_HANDLE(xen_pfn_t) gpfns;
> > > 
> > > 
> > > Looking at your arm implementation in xen, doesn't look like you
> > > are expecting idxs and gpfns to be contigous. In that case,
> > > shouldn't idxs and gpfns be pointers, ie, they are sent down as
> > > arrays? Or does GUEST_HANDLE do that, I can't seem to find where
> > > it's defined quickly.
> > 
> > Never mind, I see it got corrected to XEN_GUEST_HANDLE in staging
> > tree.
> 
> The macro is called XEN_GUEST_HANDLE in Xen and just GUEST_HANDLE in
> Linux.
> 
> > Still doesn't compile tho:
> > 
> > public/memory.h:246: error: expected specifier-qualifier-list before
> > ‘__guest_handle_xen_ulong_t’
> > 
> > I'll figure it out.
> 
> Looks like you've got it all sorted?

Yup. I made the change on xen side and added this patch to my tree
and got it working after reverting Konrad's setup.c changes. Not sure
if you need an ack from x86, but if you do:

Acked-by: Mukesh Rathor 

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH 5/5] xen: x86 pvh: use XENMEM_add_to_physmap_range for foreign gmfn mappings

2012-10-25 Thread Mukesh Rathor
On Wed, 24 Oct 2012 14:19:37 +0100
Ian Campbell  wrote:

> Squeezing the necessary fields into the existing XENMEM_add_to_physmap
> interface was proving to be a bit tricky so we have decided to go with
> a new interface upstream (the XENMAPSPACE_gmfn_foreign interface using
> XENMEM_add_to_physmap was never committed anywhere). This interface
> also allows for batching which was impossible to support at the same
> time as foreign mfns in the old interface.
> 
> This reverts the relevant parts of "PVH: basic and header changes,
> elfnote changes, ..." and followups and trivially converts
> pvh_add_to_xen_p2m over.
> 
> Signed-off-by: Ian Campbell 
> Acked-by: Stefano Stabellini 

Ok, I made the change on the xen side for x86 and tested it out. Works
fine. Second ack.

thanks, 
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-25 Thread Fengguang Wu
On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote:
> On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote:
> > Hi Chen,
> > 
> > > But how can bdi related ra_pages reflect different files' readahead
> > > window? Maybe these different files are sequential read, random read
> > > and so on.
> > 
> > It's simple: sequential reads will get ra_pages readahead size while
> > random reads will not get readahead at all.
> > 
> > Talking about the below chunk, it might hurt someone that explicitly
> > takes advantage of the behavior, however the ra_pages*2 seems more
> > like a hack than general solution to me: if the user will need
> > POSIX_FADV_SEQUENTIAL to double the max readahead window size for
> > improving IO performance, then why not just increase bdi->ra_pages and
> > benefit all reads? One may argue that it offers some differential
> > behavior to specific applications, however it may also present as a
> > counter-optimization: if the root already tuned bdi->ra_pages to the
> > optimal size, the doubled readahead size will only cost more memory
> > and perhaps IO latency.
> > 
> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, 
> > loff_t len, int advice)
> > spin_unlock(>f_lock);
> > break;
> > case POSIX_FADV_SEQUENTIAL:
> > -   file->f_ra.ra_pages = bdi->ra_pages * 2;
> 
> I think we really have to reset file->f_ra.ra_pages here as it is
> not a set-and-forget value. e.g.  shrink_readahead_size_eio() can
> reduce ra_pages as a result of IO errors. Hence if you have had io
> errors, telling the kernel that you are now going to do  sequential
> IO should reset the readahead to the maximum ra_pages value
> supported

Good point!

 but wait  this patch removes file->f_ra.ra_pages in all other
places too, so there will be no file->f_ra.ra_pages to be reset here... 

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-10-25 Thread Lai Jiangshan
Hi, KOSAKI 

On 09/28/2012 06:03 AM, KOSAKI Motohiro wrote:
> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
>> node_states[N_NORMAL_MEMORY] becomes stale.
> 
> What's mean 'stale'? I guess
> 
> : Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY]


Right.

> and
> : then it will be invalid if the platform has highmem. Luckily, almost memory 
> : hotplug aware platform don't have highmem, but are not all.
> 
> right?

Some platforms(32 bit) support logic-memory-hotplug.
Some platforms have movable memory.
They are all considered.

> I supporse this patch only meaningful on ARM platform practically.
> 

any platform whic supports memory-hotplug.

> 
> 
>> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
> 
> 
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed.
> 
> status_change_nid_normal is very ugly to me. When status_change_nid and 
> status_change_nid_normal has positive value, they are always the same.
> nid and flags value are more natual to me.

If we use flags, the semantic of "status_change_nid" is changed and we need to
change more current code, and we will add complicated to the memory hotplug
callbacks.

like this:

-   node = arg->status_change_nid;
+   if (arg->status_change_flags & (1UL << N_HIGH_MEMORY))
+   node = arg->status_change_nid;
+   else
+   node = -1;

> 
> 
> 
>>
>> Signed-off-by: Lai Jiangshan 
>> ---
>>  Documentation/memory-hotplug.txt |5 ++-
>>  include/linux/memory.h   |1 +
>>  mm/memory_hotplug.c  |   94 
>> +++--
>>  3 files changed, 83 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/memory-hotplug.txt 
>> b/Documentation/memory-hotplug.txt
>> index 6d0c251..6e6cbc7 100644
>> --- a/Documentation/memory-hotplug.txt
>> +++ b/Documentation/memory-hotplug.txt
>> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
>> memory_notify.
>>  struct memory_notify {
>> unsigned long start_pfn;
>> unsigned long nr_pages;
>> +   int status_change_nid_normal;
>> int status_change_nid;
>>  }
>>  
>>  start_pfn is start_pfn of online/offline memory.
>>  nr_pages is # of pages of online/offline memory.
>> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
>> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>>  set/clear. It means a new(memoryless) node gets new memory by online and a
>>  node loses all memory. If this is -1, then nodemask status is not changed.
>> -If status_changed_nid >= 0, callback should create/discard structures for 
>> the
>> +If status_changed_nid* >= 0, callback should create/discard structures for 
>> the
>>  node if necessary.
>>  
>>  --
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index ff9a9f8..a09216d 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>>  struct memory_notify {
>>  unsigned long start_pfn;
>>  unsigned long nr_pages;
>> +int status_change_nid_normal;
>>  int status_change_nid;
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 6a5b90d..b62d429b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
>> unsigned long nr_pages,
>>  return 0;
>>  }
>>  
>> +static void check_nodemasks_changes_online(unsigned long nr_pages,
>> +struct zone *zone, struct memory_notify *arg)
>> +{
>> +int nid = zone_to_nid(zone);
>> +enum zone_type zone_last = ZONE_NORMAL;
>> +
>> +if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +zone_last = ZONE_MOVABLE;
> 
> This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.

If we don't have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_MOVABLE

if we have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_NORMAL

> 
> 
>> +
>> +if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
>> +arg->status_change_nid_normal = nid;
>> +else
>> +arg->status_change_nid_normal = -1;
> 
> Wrong. The onlined node may only have high mem zone. IOW, think fake numa 
> case etc.

"zone_idx(zone) <= zone_last" checks this case. the result is "else" branch.

> 
> 
>> +
>> +if (!node_state(nid, N_HIGH_MEMORY))
>> +arg->status_change_nid = nid;
>> +   

Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

2012-10-25 Thread Namhyung Kim
Hi Jiri,

On Sat, 20 Oct 2012 16:33:08 +0200, Jiri Olsa wrote:
> hi,
> adding support to read sample values through the PERF_SAMPLE_READ
> sample type. It's now possible to specify 'S' modifier for an event
> and get its sample value by PERF_SAMPLE_READ.

I have a question.  What's an actual impact of specifying 'S' modifiere
to a non-group event or even only a (non-leader) member of a group?  For
instance, 'cycles:S' or '{branches,branch-misses:S}'.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kdump with signed images

2012-10-25 Thread Mimi Zohar
On Thu, 2012-10-25 at 14:55 -0400, Vivek Goyal wrote:
> On Thu, Oct 25, 2012 at 02:40:21PM -0400, Mimi Zohar wrote:
> > On Thu, 2012-10-25 at 10:10 -0400, Vivek Goyal wrote:
> > > On Thu, Oct 25, 2012 at 02:10:01AM -0400, Mimi Zohar wrote:
> > > 
> > > [..]
> > > > IMA-appraisal verifies the integrity of file data, while EVM verifies
> > > > the integrity of the file metadata, such as LSM and IMA-appraisal
> > > > labels.  Both 'security.ima' and 'security.evm' can contain digital
> > > > signatures.
> > > 
> > > But the private key for creating these digital signature needs to be
> > > on the target system?
> > > 
> > > Thanks
> > > Vivek
> > 
> > Absolutely not.  The public key needs to be added to the _ima or _evm
> > keyrings.  Roberto Sassu modified dracut and later made equivalent
> > changes to systemd.  Both have been upstreamed.
> 
> Putting public key in _ima or _evm keyring is not the problem. This is
> just the verification part.
> 
> > Dmitry has a package
> > that labels the filesystem called ima-evm-utils, which supports hash
> > (IMA), hmac(EVM) and digital signatures(both).
> > 
> > We're hoping that distro's would label all immutable files, not only elf
> > executables, with digital signatures and mutable files with a hash.
> 
> So this labeling (digital signing) can happen at build time?

There is nothing inherently preventing it from happening at build time.
Elana Reshetova gave a talk at LSS 2012 on modifying RPM
http://lwn.net/Articles/518265/. 

> I suspect you need labeling to happen at system install time? If yes,
> installer does not have the private key to sign anything.

The installed system needs to be labeled, but how that occurs is
dependent on your environment (eg. flash, rpm based install).  Neither
of these mechanisms would require the build private key.

On a running system, the package installer, after verifying the package
integrity, would install each file with the associated 'security.ima'
extended attribute.  The 'security.evm' digital signature would be
installed with an HMAC, calculated using a system unique key. 

> IOW, if distro sign a file, they will most likely put signatures in
> ELF header (something along the lines of signing PE/COFF binaries). 

Rusty was definitely against putting the signature in the ELF header for
kernel modules.  Why would this be any different?

> But
> I think you need digital signatures to be put in security.ima which are
> stored in xattrs and xattrs are not generated till you put file in
> question on target file system.
> 
> Thanks
> Vivek

The 'security.ima' digital signature would be created as part of the
build process and stored as an extended attribute with the file, like
other metadata.  On install, the file, extended attributes and other
metadata would be copied to the target file system.

Mimi
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Please pull more NFS related bugfixes

2012-10-25 Thread Myklebust, Trond
Hi Linus,

This pull fixes a fairly urgent issue with the NFSv2/v3 statd code that
is causing Oopses, as well as some long standing races with the SUNRPC
tcp code.

The following changes since commit 0e9e3e306c7e472bdcffa34c4c4584301eda03b3:

  Merge tag 'stable/for-linus-3.7-rc2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen (2012-10-24 05:17:27 
+0300)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-3

for you to fetch changes up to e498daa81295d02f7359af313c2b7f87e1062207:

  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero (2012-10-24 
10:46:22 -0400)


NFS bugfixes for Linux 3.7

- Fix the NFSv2/v3 kernel statd protocol, which broke due to net namespace
  related changes.
- Fix a number of races in the SUNRPC TCP disconnect/reconnect code.


Trond Myklebust (6):
  SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT
  Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
  SUNRPC: Prevent races in xs_abort_connection()
  SUNRPC: Get rid of the xs_error_report socket callback
  LOCKD: fix races in nsm_client_get
  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero

 fs/lockd/mon.c| 57 +--
 net/sunrpc/xprtsock.c | 41 +---
 2 files changed, 42 insertions(+), 56 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader

2012-10-25 Thread wwang
于 2012年10月26日 02:50, Greg KH 写道:
> On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote:
>> On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote:
>>> We are still maintaining the SCSI driver for Realtek card reader,
>>> and will release the latest source code in the Github in the future.
>>> But maybe we won't push it to the staging tree any more.
>> Maybe we should just remove the staging code if it won't be fixed.
>> That's sort of the point of staging.
> I agree.  wwang, want me to delet the staging driver now?  I don't want
> "dead" code in the tree, especially as you don't want to maintain it
> anymore.
>
> thanks,
>
> greg k-h
Hi Greg:

OK. You can delete it now, please.
And I will push my new driver to MFD and MMC subsystem.

Best Regards,
wwang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 5/5] zynq: move static peripheral mappings

2012-10-25 Thread Josh Cartwright
On Thu, Oct 25, 2012 at 06:41:08PM -0400, Nick Bowler wrote:
> On 2012-10-25 16:29 -0500, Josh Cartwright wrote:
> > On Thu, Oct 25, 2012 at 04:17:01PM -0400, Nick Bowler wrote:
> > > Did you test this on any real hardware?  I can't get the ZC702 to work
> > > with the UART mapped at this address (this ends up being mapped at
> > > 0xFEFFF000), although I can't for the life of me figure out why the
> > > virtual address even matters.  Note that for the ZC702, the physical
> > > address of the "main" UART is 0xE0001000.

Good news is you're not crazy; I was able to duplicate the problem here.

> If I were to guess, I would guess that, except for when it "Works",
> the really really early printk stuff isn't actually hitting the uart
> at all.  The "Fails" case would then be due to the stray writes
> crashing the board, and the "Truncated" case due to the stray writes
> being (ostensibly) benign.

If I'm not mistaken, this hypothesis is predicated on the early bootup
code establishing a (linear?) mapping for addresses > VMALLOC_START;
before the mdesc->map_io() is even handled.  That seems odd to me.

> But I really have no way right now to test this hypothesis, since I
> can't print anything in the failing case.

Not sure if I'll be able to get anything meaningful out of it yet (I've
not historically had good luck with Xilinx's debugging tools), but I did
finally get a JTAG debugger hooked up to the zc702.  I'll see if I can
get any useful information tomorrow.

Thanks,

  Josh


pgpYXByIc0y2V.pgp
Description: PGP signature


[git pull] drm radeon fixes.

2012-10-25 Thread Dave Airlie

Hi Linus,

Just radeon fixes in this one,
some new PCI IDs,
ATPX regression fix,
async VM regression fixes
some module options fixes.

Dave.

The following changes since commit b8e902f24fdd16c4373ddc37a4e150c4afe9c6db:

  drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs() (2012-10-23 10:15:21 
+1000)

are available in the git repository at:
  git://people.freedesktop.org/~airlied/linux drm-fixes

Alex Deucher (6):
  drm/radeon: add some new SI PCI ids
  drm/radeon: fix sparse warning
  drm/radeon: give each backlight a unique id
  drm/radeon: add error output if VM CS fails on cayman
  drm/radeon: fix ATPX function documentation
  drm/radeon: fix ATPX regression in acpi rework

Christian König (9):
  drm/radeon: fix PFP sync in vm_flush
  drm/radeon: fix cayman_vm_set_page v2
  drm/radeon: fix si_set_page v2
  drm/radeon: remove set_page check from VM code
  drm/radeon: fix header size estimation in VM code
  drm/radeon: fix and simplify pot argument checks v3
  drm/radeon: use vzalloc for gart pages
  drm/radeon: move size limits to gem_object_create.
  drm/radeon: move the retry to gem_object_create

Dave Airlie (1):
  Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux 
into drm-fixes

 drivers/gpu/drm/radeon/atombios_encoders.c  |5 ++-
 drivers/gpu/drm/radeon/evergreen_cs.c   |1 +
 drivers/gpu/drm/radeon/ni.c |   45 ++---
 drivers/gpu/drm/radeon/nid.h|1 +
 drivers/gpu/drm/radeon/radeon_atpx_handler.c|6 +-
 drivers/gpu/drm/radeon/radeon_device.c  |   60 +--
 drivers/gpu/drm/radeon/radeon_gart.c|   22 -
 drivers/gpu/drm/radeon/radeon_gem.c |   18 ++-
 drivers/gpu/drm/radeon/radeon_legacy_encoders.c |5 ++-
 drivers/gpu/drm/radeon/radeon_object.c  |   19 ---
 drivers/gpu/drm/radeon/si.c |   47 +++---
 include/drm/drm_pciids.h|3 +
 12 files changed, 122 insertions(+), 110 deletions(-)

Re: [RFC] Support volatile range for anon vma

2012-10-25 Thread Minchan Kim
Hi Christoph,

On Thu, Oct 25, 2012 at 03:19:27PM +, Christoph Lameter wrote:
> On Thu, 25 Oct 2012, Minchan Kim wrote:
> 
> >  #endif
> > +   /*
> > +* True if page in this vma is reclaimed.
> 
> What does that mean? All pages in the vma have been cleared out?

It means at least, more than one is reclaimed.
Comment should have been cleared.

> 
> > +   TTU_IGNORE_VOLATILE = (1 << 11),/* ignore volatile */
> >  };
> >  #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
> >
> >  int try_to_unmap(struct page *, enum ttu_flags flags);
> >  int try_to_unmap_one(struct page *, struct vm_area_struct *,
> > -   unsigned long address, enum ttu_flags flags);
> > +   unsigned long address, enum ttu_flags flags,
> > +   bool *is_volatile);
> 
> You already pass a vma pointer in. Why do you need to pass a
> volatile flag in? Looks like unecessary churn.

You mean we can use vma->purged instead of is_volatile passing?
The is_volatile is just checking for that all of vmas share the page
are volatile ones. Then, vma->purged is just checking for that the page
is zapped in the vma. If one of vma share the page isn't volatile, we can't zap.

BTW, Christoph, what do you think about the goal of the patch which changes
munmap(2) to madvise(2) when user calls free(3) in user allocator like glibc?
I guess it would improve system performance very well.
But as I wrote down in description, downside of the patch is that we have to
age anon lru although we don't have swap. But gain via the patch is bigger than
loss via aging of anon lru when memory pressure happens. I don't see other 
downside
other than it. What do you think about it?
(I didn't implement anon lru aging in case of no-swap but it's trivial
once we decide)

Thanks for the review, Christoph

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 13/31] USB: option: blacklist net interface on ZTE devices

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Bjørn Mork 

commit 1452df6f1b7e396d89c2a1fdbdc0e0e839f97671 upstream.

Based on information from the ZTE Windows drivers.

Signed-off-by: Bjørn Mork 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/serial/option.c |   74 ++--
 1 file changed, 52 insertions(+), 22 deletions(-)

--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -503,11 +503,19 @@ static const struct option_blacklist_inf
.reserved = BIT(5),
 };
 
+static const struct option_blacklist_info net_intf6_blacklist = {
+   .reserved = BIT(6),
+};
+
 static const struct option_blacklist_info zte_mf626_blacklist = {
.sendsetup = BIT(0) | BIT(1),
.reserved = BIT(4),
 };
 
+static const struct option_blacklist_info zte_1255_blacklist = {
+   .reserved = BIT(3) | BIT(4),
+};
+
 static const struct usb_device_id option_ids[] = {
{ USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_COLT) },
{ USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_RICOLA) },
@@ -853,13 +861,19 @@ static const struct usb_device_id option
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0113, 0xff, 0xff, 
0xff),
.driver_info = (kernel_ulong_t)_intf5_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0117, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0118, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0121, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0118, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf5_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0121, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf5_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0122, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0123, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0124, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0125, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0126, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0123, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0124, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf5_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0125, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf6_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0126, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf5_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0128, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0142, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0143, 0xff, 0xff, 
0xff) },
@@ -872,7 +886,8 @@ static const struct usb_device_id option
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0156, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0157, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf5_blacklist },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0158, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0158, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf3_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0159, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0161, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0162, 0xff, 0xff, 
0xff) },
@@ -880,9 +895,12 @@ static const struct usb_device_id option
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf4_blacklist },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 
0xff) },
-   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf4_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff, 

[ 14/31] USB: option: add more ZTE devices

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Bjørn Mork 

commit 4b35f1c52943851b310afb09047bfe991ac8f5ae upstream.

Signed-off-by: Bjørn Mork 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/serial/option.c |   18 ++
 1 file changed, 18 insertions(+)

--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -895,12 +895,22 @@ static const struct usb_device_id option
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0191, 0xff, 0xff, 
0xff), /* ZTE EuFi890 */
+ .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0199, 0xff, 0xff, 
0xff), /* ZTE MF820S */
+ .driver_info = (kernel_ulong_t)_intf1_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0257, 0xff, 0xff, 
0xff), /* ZTE MF821 */
+ .driver_info = (kernel_ulong_t)_intf3_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0326, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf4_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf4_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf4_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 
0xff),
  .driver_info = (kernel_ulong_t)_intf4_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1021, 0xff, 0xff, 
0xff),
+ .driver_info = (kernel_ulong_t)_intf2_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1058, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1059, 0xff, 0xff, 
0xff) },
@@ -1078,8 +1088,16 @@ static const struct usb_device_id option
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1298, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1299, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1300, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1401, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf2_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1402, 0xff, 0xff, 
0xff),
.driver_info = (kernel_ulong_t)_intf2_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1424, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf2_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1425, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)_intf2_blacklist },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1426, 0xff, 0xff, 
0xff),  /* ZTE MF91 */
+   .driver_info = (kernel_ulong_t)_intf2_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2002, 0xff,
  0xff, 0xff), .driver_info = (kernel_ulong_t)_k3765_z_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2003, 0xff, 0xff, 
0xff) },


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 15/31] cgroup: notify_on_release may not be triggered in some cases

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Daisuke Nishimura 

commit 1f5320d5972aa50d3e8d2b227b636b370e608359 upstream.

notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.

# mkdir /cgroup/cpu/SRC
# mkdir /cgroup/cpu/DST
#
# echo 1 >/cgroup/cpu/SRC/notify_on_release
# echo 1 >/cgroup/cpu/DST/notify_on_release
#
# sleep 300 &
[1] 8629
#
# echo 8629 >/cgroup/cpu/SRC/tasks
# echo 8629 >/cgroup/cpu/DST/tasks
-> notify_on_release for /SRC must be triggered at this point,
   but it isn't.

This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.

Acked-by: Li Zefan 
Cc: Ben Blum 
Signed-off-by: Daisuke Nishimura 
Signed-off-by: Tejun Heo 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/cgroup.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1800,9 +1800,8 @@ static int cgroup_task_migrate(struct cg
 * trading it for newcg is protected by cgroup_mutex, we're safe to drop
 * it here; it will be freed under RCU.
 */
-   put_css_set(oldcg);
-
set_bit(CGRP_RELEASABLE, >flags);
+   put_css_set(oldcg);
return 0;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 02/31] NLM: nlm_lookup_file() may return NLMv4-specific error codes

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Trond Myklebust 

commit cd0b16c1c3cda12dbed1f8de8f1a9b0591990724 upstream.

If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.

Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...

Signed-off-by: Trond Myklebust 
Reported-by: Larry McVoy 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/lockd/clntxdr.c |2 +-
 fs/lockd/svcproc.c |3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

--- a/fs/lockd/clntxdr.c
+++ b/fs/lockd/clntxdr.c
@@ -223,7 +223,7 @@ static void encode_nlm_stat(struct xdr_s
 {
__be32 *p;
 
-   BUG_ON(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
+   WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
p = xdr_reserve_space(xdr, 4);
*p = stat;
 }
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -67,7 +67,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rq
 
/* Obtain file pointer. Not used by FREE_ALL call. */
if (filp != NULL) {
-   if ((error = nlm_lookup_file(rqstp, , >fh)) != 0)
+   error = cast_status(nlm_lookup_file(rqstp, , >fh));
+   if (error != 0)
goto no_locks;
*filp = file;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 17/31] media: au0828: fix case where STREAMOFF being called on stopped stream causes BUG()

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Devin Heitmueller 

commit a595c1ce4c9d572cf53513570b9f1a263d7867f2 upstream.

We weren't checking whether the resource was in use before calling
res_free(), so applications which called STREAMOFF on a v4l2 device that
wasn't already streaming would cause a BUG() to be hit (MythTV).

Reported-by: Larry Finger 
Reported-by: Jay Harbeston 
Signed-off-by: Devin Heitmueller 
Signed-off-by: Mauro Carvalho Chehab 

---
 drivers/media/video/au0828/au0828-video.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/drivers/media/video/au0828/au0828-video.c
+++ b/drivers/media/video/au0828/au0828-video.c
@@ -1697,14 +1697,18 @@ static int vidioc_streamoff(struct file
(AUVI_INPUT(i).audio_setup)(dev, 0);
}
 
-   videobuf_streamoff(>vb_vidq);
-   res_free(fh, AU0828_RESOURCE_VIDEO);
+   if (res_check(fh, AU0828_RESOURCE_VIDEO)) {
+   videobuf_streamoff(>vb_vidq);
+   res_free(fh, AU0828_RESOURCE_VIDEO);
+   }
} else if (fh->type == V4L2_BUF_TYPE_VBI_CAPTURE) {
dev->vbi_timeout_running = 0;
del_timer_sync(>vbi_timeout);
 
-   videobuf_streamoff(>vb_vbiq);
-   res_free(fh, AU0828_RESOURCE_VBI);
+   if (res_check(fh, AU0828_RESOURCE_VBI)) {
+   videobuf_streamoff(>vb_vbiq);
+   res_free(fh, AU0828_RESOURCE_VBI);
+   }
}
 
return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 18/31] drm/i915: apply timing generator bug workaround on CPT and PPT

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Jesse Barnes 

commit 3bcf603f6d5d18bd9d076dc280de71f48add4101 upstream.

On CougarPoint and PantherPoint PCH chips, the timing generator may fail
to start after DP training completes.  This is due to a bug in the
FDI autotraining detect logic (which will stall the timing generator and
re-enable it once training completes), so disable it to avoid silent DP
mode setting failures.

Signed-off-by: Jesse Barnes 
Signed-off-by: Keith Packard 
Signed-off-by: Timo Aaltonen 

---
 drivers/gpu/drm/i915/i915_reg.h  |5 +
 drivers/gpu/drm/i915/intel_display.c |4 
 2 files changed, 9 insertions(+)

--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3113,6 +3113,11 @@
 #define  TRANS_6BPC (2<<5)
 #define  TRANS_12BPC(3<<5)
 
+#define _TRANSA_CHICKEN20xf0064
+#define _TRANSB_CHICKEN20xf1064
+#define TRANS_CHICKEN2(pipe) _PIPE(pipe, _TRANSA_CHICKEN2, _TRANSB_CHICKEN2)
+#define   TRANS_AUTOTRAIN_GEN_STALL_DIS(1<<31)
+
 #define SOUTH_CHICKEN2 0xc2004
 #define  DPLS_EDP_PPS_FIX_DIS  (1<<0)
 
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7584,6 +7584,7 @@ static void ibx_init_clock_gating(struct
 static void cpt_init_clock_gating(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
+   int pipe;
 
/*
 * On Ibex Peak and Cougar Point, we need to disable clock
@@ -7593,6 +7594,9 @@ static void cpt_init_clock_gating(struct
I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE);
I915_WRITE(SOUTH_CHICKEN2, I915_READ(SOUTH_CHICKEN2) |
   DPLS_EDP_PPS_FIX_DIS);
+   /* Without this, mode sets may fail silently on FDI */
+   for_each_pipe(pipe)
+   I915_WRITE(TRANS_CHICKEN2(pipe), TRANS_AUTOTRAIN_GEN_STALL_DIS);
 }
 
 static void ironlake_teardown_rc6(struct drm_device *dev)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 05/31] Revert: lockd: use rpc clients cl_nodename for id encoding

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Greg Kroah-Hartman 

This reverts 12d63702c53bc2230dfc997e91ca891f39cb6446 which was commit
303a7ce92064c285a04c870f2dc0192fdb2968cb upstream.

Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.

Signed-off-by: Greg Kroah-Hartman 
Cc: Stanislav Kinsbursky 
Cc: Trond Myklebust 

---
 fs/lockd/mon.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -40,7 +40,6 @@ struct nsm_args {
u32 proc;
 
char*mon_name;
-   char*nodename;
 };
 
 struct nsm_res {
@@ -94,7 +93,6 @@ static int nsm_mon_unmon(struct nsm_hand
.vers   = 3,
.proc   = NLMPROC_NSM_NOTIFY,
.mon_name   = nsm->sm_mon_name,
-   .nodename   = utsname()->nodename,
};
struct rpc_message msg = {
.rpc_argp   = ,
@@ -431,7 +429,7 @@ static void encode_my_id(struct xdr_stre
 {
__be32 *p;
 
-   encode_nsm_string(xdr, argp->nodename);
+   encode_nsm_string(xdr, utsname()->nodename);
p = xdr_reserve_space(xdr, 4 + 4 + 4);
*p++ = cpu_to_be32(argp->prog);
*p++ = cpu_to_be32(argp->vers);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 16/31] amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Andrew Morton 

commit 168bfeef7bba3f9784f7540b053e4ac72b769ce9 upstream.

If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.

As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.

Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel=135102834131236=2

It will be rewritten more robustly soonish.

Reported-by: Denis Kirjanov 
Cc: Doug Thompson 
Signed-off-by: Andrew Morton 
Signed-off-by: Borislav Petkov 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/edac/amd64_edac.c |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -161,8 +161,11 @@ static int __amd64_set_scrub_rate(struct
 * memory controller and apply to register. Search for the first
 * bandwidth entry that is greater or equal than the setting requested
 * and program that. If at last entry, turn off DRAM scrubbing.
+*
+* If no suitable bandwidth is found, turn off DRAM scrubbing entirely
+* by falling back to the last element in scrubrates[].
 */
-   for (i = 0; i < ARRAY_SIZE(scrubrates); i++) {
+   for (i = 0; i < ARRAY_SIZE(scrubrates) - 1; i++) {
/*
 * skip scrub rates which aren't recommended
 * (see F10 BKDG, F3x58)
@@ -172,12 +175,6 @@ static int __amd64_set_scrub_rate(struct
 
if (scrubrates[i].bandwidth <= new_bw)
break;
-
-   /*
-* if no suitable bandwidth found, turn off DRAM scrubbing
-* entirely by falling back to the last element in the
-* scrubrates array.
-*/
}
 
scrubval = scrubrates[i].scrubval;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 07/31] kernel/sys.c: fix stack memory content leak via UNAME26

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 2702b1526c7278c4d65d78de209a465d4de2885e upstream.

Calling uname() with the UNAME26 personality set allows a leak of kernel
stack contents.  This fixes it by defensively calculating the length of
copy_to_user() call, making the len argument unsigned, and initializing
the stack buffer to zero (now technically unneeded, but hey, overkill).

CVE-2012-0957

Reported-by: PaX Team 
Signed-off-by: Kees Cook 
Cc: Andi Kleen 
Cc: PaX Team 
Cc: Brad Spengler 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/sys.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1133,15 +1133,16 @@ DECLARE_RWSEM(uts_sem);
  * Work around broken programs that cannot handle "Linux 3.0".
  * Instead we map 3.x to 2.6.40+x, so e.g. 3.0 would be 2.6.40
  */
-static int override_release(char __user *release, int len)
+static int override_release(char __user *release, size_t len)
 {
int ret = 0;
-   char buf[65];
 
if (current->personality & UNAME26) {
-   char *rest = UTS_RELEASE;
+   const char *rest = UTS_RELEASE;
+   char buf[65] = { 0 };
int ndots = 0;
unsigned v;
+   size_t copy;
 
while (*rest) {
if (*rest == '.' && ++ndots >= 3)
@@ -1151,8 +1152,9 @@ static int override_release(char __user
rest++;
}
v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40;
-   snprintf(buf, len, "2.6.%u%s", v, rest);
-   ret = copy_to_user(release, buf, len);
+   copy = min(sizeof(buf), max_t(size_t, 1, len));
+   copy = scnprintf(buf, copy, "2.6.%u%s", v, rest);
+   ret = copy_to_user(release, buf, copy + 1);
}
return ret;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[for-next PATCH V2] PM / devfreq: Add sysfs node to expose available frequencies

2012-10-25 Thread Nishanth Menon
devfreq governors such as ondemand are controlled by a min and
max frequency, while governors like userspace governor allow us
to set a specific frequency.
However, for the same specific device, depending on the SoC, the
available frequencies can vary.

So expose the available frequencies as a snapshot over sysfs to
allow informed decisions.

This was inspired by cpufreq framework's equivalent for similar
usage sysfs node: scaling_available_frequencies.

Cc: Rajagopal Venkat 
Cc: MyungJoo Ham 
Cc: Kyungmin Park 
Cc: "Rafael J. Wysocki" 
Cc: Kevin Hilman 
Cc: linux...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Nishanth Menon 
---
Applies on top of Rafael's linux-next branch:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
linux-nextb960e9a Merge branch 'pm-sleep-next' into linux-next

Example output from Beagleboard XM (3730) using a dummy test driver
http://pastebin.pandaboard.org/index.php/view/85100576 :

/sys/devices/platform/iva.0/devfreq/iva.0 # cat available_frequencies 
26000 52000 66000
/sys/devices/platform/iva.0/devfreq/iva.0 # cat available_frequencies|tr ' ' '-'
26000-52000-66000

V2 :
- review comment update from v1
- protected the sysfs from buffer overflow - just in case..
V1: https://patchwork.kernel.org/patch/1648001/

 Documentation/ABI/testing/sysfs-class-devfreq |9 +++
 drivers/devfreq/devfreq.c |   32 +
 2 files changed, 41 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-devfreq 
b/Documentation/ABI/testing/sysfs-class-devfreq
index e6cf08e..e672ccb 100644
--- a/Documentation/ABI/testing/sysfs-class-devfreq
+++ b/Documentation/ABI/testing/sysfs-class-devfreq
@@ -51,3 +51,12 @@ Description:
The /sys/class/devfreq/.../userspace/set_freq shows and
sets the requested frequency for the devfreq object if
userspace governor is in effect.
+
+What:  /sys/class/devfreq/.../available_frequencies
+Date:  October 2012
+Contact:   Nishanth Menon 
+Description:
+   The /sys/class/devfreq/.../available_frequencies shows
+   the available frequencies of the corresponding devfreq object.
+   This is a snapshot of available frequencies and not limited
+   by the min/max frequency restrictions.
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index d02ee7e..104018e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -571,9 +571,41 @@ static ssize_t show_max_freq(struct device *dev, struct 
device_attribute *attr,
return sprintf(buf, "%lu\n", to_devfreq(dev)->max_freq);
 }
 
+static ssize_t show_available_freqs(struct device *d,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct devfreq *df = to_devfreq(d);
+   struct device *dev = df->dev.parent;
+   struct opp *opp;
+   ssize_t count = 0;
+   unsigned long freq = 0;
+
+   rcu_read_lock();
+   do {
+   opp = opp_find_freq_ceil(dev, );
+   if (IS_ERR(opp))
+   break;
+
+   count += scnprintf([count], (PAGE_SIZE - count - 2),
+  "%lu ", freq);
+   freq++;
+   } while (1);
+   rcu_read_unlock();
+
+   /* Truncate the trailing space */
+   if (count)
+   count--;
+
+   count += sprintf([count], "\n");
+
+   return count;
+}
+
 static struct device_attribute devfreq_attrs[] = {
__ATTR(governor, S_IRUGO, show_governor, NULL),
__ATTR(cur_freq, S_IRUGO, show_freq, NULL),
+   __ATTR(available_frequencies, S_IRUGO, show_available_freqs, NULL),
__ATTR(target_freq, S_IRUGO, show_target_freq, NULL),
__ATTR(polling_interval, S_IRUGO | S_IWUSR, show_polling_interval,
   store_polling_interval),
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 09/31] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Jacob Shin 

commit 1e779aabe1f0768c2bf8f8c0a5583679b54a upstream.

On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.

[ hpa: this should be done not just for > 4 GB but for everything above the 
legacy
  region (1 MB), at the very least.  That, however, turns out to require 
significant
  restructuring.  That work is well underway, but is not suitable for 
rc/stable. ]

Signed-off-by: Jacob Shin 
Link: 
http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.s...@amd.com
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/kernel/setup.c |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -937,8 +937,21 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_X86_64
if (max_pfn > max_low_pfn) {
-   max_pfn_mapped = init_memory_mapping(1UL<<32,
-max_pfnsize <= 1UL << 32)
+   continue;
+
+   if (ei->type == E820_RESERVED)
+   continue;
+
+   max_pfn_mapped = init_memory_mapping(
+   ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+   ei->addr + ei->size);
+   }
+
/* can we preseve max_low_pfn ?*/
max_low_pfn = max_pfn;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 11/31] USB: cdc-acm: fix pipe type of write endpoint

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Ming Lei 

commit c5211187f7ff8e8dbff4ebf7c011ac4c0ffe319c upstream.

If the write endpoint is interrupt type, usb_sndintpipe() should
be passed to usb_fill_int_urb() instead of usb_sndbulkpipe().

Signed-off-by: Ming Lei 
Cc: Oliver Neukum 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/class/cdc-acm.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1172,7 +1172,7 @@ made_compressed_probe:
 
if (usb_endpoint_xfer_int(epwrite))
usb_fill_int_urb(snd->urb, usb_dev,
-   usb_sndbulkpipe(usb_dev, 
epwrite->bEndpointAddress),
+   usb_sndintpipe(usb_dev, 
epwrite->bEndpointAddress),
NULL, acm->writesize, acm_write_bulk, snd, 
epwrite->bInterval);
else
usb_fill_bulk_urb(snd->urb, usb_dev,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 12/31] usb: acm: fix the computation of the number of data bits

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Nicolas Boullis 

commit 301a29da6e891e7eb95c843af0ecdbe86d01f723 upstream.

The current code assumes that CSIZE is 060, which appears to be
wrong on some arches (such as powerpc).

Signed-off-by: Nicolas Boullis 
Acked-by: Oliver Neukum 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/class/cdc-acm.c |   20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -760,10 +760,6 @@ static const __u32 acm_tty_speed[] = {
250, 300, 350, 400
 };
 
-static const __u8 acm_tty_size[] = {
-   5, 6, 7, 8
-};
-
 static void acm_tty_set_termios(struct tty_struct *tty,
struct ktermios *termios_old)
 {
@@ -780,7 +776,21 @@ static void acm_tty_set_termios(struct t
newline.bParityType = termios->c_cflag & PARENB ?
(termios->c_cflag & PARODD ? 1 : 2) +
(termios->c_cflag & CMSPAR ? 2 : 0) : 0;
-   newline.bDataBits = acm_tty_size[(termios->c_cflag & CSIZE) >> 4];
+   switch (termios->c_cflag & CSIZE) {
+   case CS5:
+   newline.bDataBits = 5;
+   break;
+   case CS6:
+   newline.bDataBits = 6;
+   break;
+   case CS7:
+   newline.bDataBits = 7;
+   break;
+   case CS8:
+   default:
+   newline.bDataBits = 8;
+   break;
+   }
/* FIXME: Needs to clear unsupported bits in the termios */
acm->clocal = ((termios->c_cflag & CLOCAL) != 0);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 08/31] use clamp_t in UNAME26 fix

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream.

The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

  kernel/sys.c: In function 'override_release':
  kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a 
cast [enabled by default]

Reported-by: Fengguang Wu 
Signed-off-by: Kees Cook 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/sys.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1152,7 +1152,7 @@ static int override_release(char __user
rest++;
}
v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40;
-   copy = min(sizeof(buf), max_t(size_t, 1, len));
+   copy = clamp_t(size_t, len, 1, sizeof(buf));
copy = scnprintf(buf, copy, "2.6.%u%s", v, rest);
ret = copy_to_user(release, buf, copy + 1);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 06/31] pcmcia: sharpsl: dont discard sharpsl_pcmcia_ops

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Arnd Bergmann 

commit fdc858a466b738d35d3492bc7cf77b1dac98bf7c upstream.

The sharpsl_pcmcia_ops structure gets passed into
sa11xx_drv_pcmcia_probe, where it gets accessed at run-time,
unlike all other pcmcia drivers that pass their structures
into platform_device_add_data, which makes a copy.

This means the gcc warning is valid and the structure
must not be marked as __initdata.

Without this patch, building collie_defconfig results in:

drivers/pcmcia/pxa2xx_sharpsl.c:22:31: fatal error: mach-pxa/hardware.h: No 
such file or directory
compilation terminated.
make[3]: *** [drivers/pcmcia/pxa2xx_sharpsl.o] Error 1
make[2]: *** [drivers/pcmcia] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2

Signed-off-by: Arnd Bergmann 
Cc: Dominik Brodowski 
Cc: Russell King 
Cc: Pavel Machek 
Cc: linux-pcm...@lists.infradead.org
Cc: Jochen Friedrich 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/pcmcia/pxa2xx_sharpsl.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/pcmcia/pxa2xx_sharpsl.c
+++ b/drivers/pcmcia/pxa2xx_sharpsl.c
@@ -222,7 +222,7 @@ static void sharpsl_pcmcia_socket_suspen
sharpsl_pcmcia_init_reset(skt);
 }
 
-static struct pcmcia_low_level sharpsl_pcmcia_ops __initdata = {
+static struct pcmcia_low_level sharpsl_pcmcia_ops = {
.owner  = THIS_MODULE,
.hw_init= sharpsl_pcmcia_hw_init,
.hw_shutdown= sharpsl_pcmcia_hw_shutdown,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 29/31] xHCI: add aborting command ring function

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Elric Fu 

commit b92cc66c047ff7cf587b318fe377061a353c120f upstream.

Software have to abort command ring and cancel command
when a command is failed or hang. Otherwise, the command
ring will hang up and can't handle the others. An example
of a command that may hang is the Address Device Command,
because waiting for a SET_ADDRESS request to be acknowledged
by a USB device is outside of the xHC's ability to control.

To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci.

Sarah: Fixed missing newline on "Have the command ring been stopped?"
debugging statement.

This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.

Signed-off-by: Elric Fu 
Signed-off-by: Sarah Sharp 
Tested-by: Miroslav Sabljic 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/host/xhci-mem.c  |7 ++
 drivers/usb/host/xhci-ring.c |  108 +++
 drivers/usb/host/xhci.c  |2 
 drivers/usb/host/xhci.h  |   12 
 4 files changed, 128 insertions(+), 1 deletion(-)

--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1505,6 +1505,7 @@ void xhci_free_command(struct xhci_hcd *
 void xhci_mem_cleanup(struct xhci_hcd *xhci)
 {
struct pci_dev  *pdev = to_pci_dev(xhci_to_hcd(xhci)->self.controller);
+   struct xhci_cd  *cur_cd, *next_cd;
int size;
int i;
 
@@ -1525,6 +1526,11 @@ void xhci_mem_cleanup(struct xhci_hcd *x
xhci_ring_free(xhci, xhci->cmd_ring);
xhci->cmd_ring = NULL;
xhci_dbg(xhci, "Freed command ring\n");
+   list_for_each_entry_safe(cur_cd, next_cd,
+   >cancel_cmd_list, cancel_cmd_list) {
+   list_del(_cd->cancel_cmd_list);
+   kfree(cur_cd);
+   }
 
for (i = 1; i < MAX_HC_SLOTS; ++i)
xhci_free_virt_device(xhci, i);
@@ -2014,6 +2020,7 @@ int xhci_mem_init(struct xhci_hcd *xhci,
xhci->cmd_ring = xhci_ring_alloc(xhci, 1, true, false, flags);
if (!xhci->cmd_ring)
goto fail;
+   INIT_LIST_HEAD(>cancel_cmd_list);
xhci_dbg(xhci, "Allocated command ring at %p\n", xhci->cmd_ring);
xhci_dbg(xhci, "First segment DMA is 0x%llx\n",
(unsigned long long)xhci->cmd_ring->first_seg->dma);
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -320,6 +320,114 @@ void xhci_ring_cmd_db(struct xhci_hcd *x
xhci_readl(xhci, >dba->doorbell[0]);
 }
 
+static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
+{
+   u64 temp_64;
+   int ret;
+
+   xhci_dbg(xhci, "Abort command ring\n");
+
+   if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) {
+   xhci_dbg(xhci, "The command ring isn't running, "
+   "Have the command ring been stopped?\n");
+   return 0;
+   }
+
+   temp_64 = xhci_read_64(xhci, >op_regs->cmd_ring);
+   if (!(temp_64 & CMD_RING_RUNNING)) {
+   xhci_dbg(xhci, "Command ring had been stopped\n");
+   return 0;
+   }
+   xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
+   xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
+   >op_regs->cmd_ring);
+
+   /* Section 4.6.1.2 of xHCI 1.0 spec says software should
+* time the completion od all xHCI commands, including
+* the Command Abort operation. If software doesn't see
+* CRR negated in a timely manner (e.g. longer than 5
+* seconds), then it should assume that the there are
+* larger problems with the xHC and assert HCRST.
+*/
+   ret = handshake(xhci, >op_regs->cmd_ring,
+   CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
+   if (ret < 0) {
+   xhci_err(xhci, "Stopped the command ring failed, "
+   "maybe the host is dead\n");
+   xhci->xhc_state |= XHCI_STATE_DYING;
+   xhci_quiesce(xhci);
+   xhci_halt(xhci);
+   return -ESHUTDOWN;
+   }
+
+   return 0;
+}
+
+static int xhci_queue_cd(struct xhci_hcd *xhci,
+   struct xhci_command *command,
+   union xhci_trb *cmd_trb)
+{
+   struct xhci_cd *cd;
+   cd = kzalloc(sizeof(struct xhci_cd), GFP_ATOMIC);
+   if (!cd)
+   return -ENOMEM;
+   INIT_LIST_HEAD(>cancel_cmd_list);
+
+   cd->command = command;
+   cd->cmd_trb = cmd_trb;
+   list_add_tail(>cancel_cmd_list, >cancel_cmd_list);
+
+   return 0;
+}
+
+/*
+ * Cancel the command which has issue.
+ *
+ * 

[ 30/31] xHCI: cancel command after command timeout

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Elric Fu 

commit 6e4468b9a0793dfb53eb80d9fe52c739b13b27fd upstream.

The patch is used to cancel command when the command isn't
acknowledged and a timeout occurs.

This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.

Signed-off-by: Elric Fu 
Signed-off-by: Sarah Sharp 
Tested-by: Miroslav Sabljic 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/host/xhci.c |   26 +++---
 drivers/usb/host/xhci.h |3 +++
 2 files changed, 22 insertions(+), 7 deletions(-)

--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1778,6 +1778,7 @@ static int xhci_configure_endpoint(struc
struct completion *cmd_completion;
u32 *cmd_status;
struct xhci_virt_device *virt_dev;
+   union xhci_trb *cmd_trb;
 
spin_lock_irqsave(>lock, flags);
virt_dev = xhci->devs[udev->slot_id];
@@ -1820,6 +1821,7 @@ static int xhci_configure_endpoint(struc
}
init_completion(cmd_completion);
 
+   cmd_trb = xhci->cmd_ring->dequeue;
if (!ctx_change)
ret = xhci_queue_configure_endpoint(xhci, in_ctx->dma,
udev->slot_id, must_succeed);
@@ -1841,14 +1843,17 @@ static int xhci_configure_endpoint(struc
/* Wait for the configure endpoint command to complete */
timeleft = wait_for_completion_interruptible_timeout(
cmd_completion,
-   USB_CTRL_SET_TIMEOUT);
+   XHCI_CMD_DEFAULT_TIMEOUT);
if (timeleft <= 0) {
xhci_warn(xhci, "%s while waiting for %s command\n",
timeleft == 0 ? "Timeout" : "Signal",
ctx_change == 0 ?
"configure endpoint" :
"evaluate context");
-   /* FIXME cancel the configure endpoint command */
+   /* cancel the configure endpoint command */
+   ret = xhci_cancel_cmd(xhci, command, cmd_trb);
+   if (ret < 0)
+   return ret;
return -ETIME;
}
 
@@ -2781,8 +2786,10 @@ int xhci_alloc_dev(struct usb_hcd *hcd,
unsigned long flags;
int timeleft;
int ret;
+   union xhci_trb *cmd_trb;
 
spin_lock_irqsave(>lock, flags);
+   cmd_trb = xhci->cmd_ring->dequeue;
ret = xhci_queue_slot_control(xhci, TRB_ENABLE_SLOT, 0);
if (ret) {
spin_unlock_irqrestore(>lock, flags);
@@ -2794,12 +2801,12 @@ int xhci_alloc_dev(struct usb_hcd *hcd,
 
/* XXX: how much time for xHC slot assignment? */
timeleft = wait_for_completion_interruptible_timeout(>addr_dev,
-   USB_CTRL_SET_TIMEOUT);
+   XHCI_CMD_DEFAULT_TIMEOUT);
if (timeleft <= 0) {
xhci_warn(xhci, "%s while waiting for a slot\n",
timeleft == 0 ? "Timeout" : "Signal");
-   /* FIXME cancel the enable slot request */
-   return 0;
+   /* cancel the enable slot request */
+   return xhci_cancel_cmd(xhci, NULL, cmd_trb);
}
 
if (!xhci->slot_id) {
@@ -2860,6 +2867,7 @@ int xhci_address_device(struct usb_hcd *
struct xhci_slot_ctx *slot_ctx;
struct xhci_input_control_ctx *ctrl_ctx;
u64 temp_64;
+   union xhci_trb *cmd_trb;
 
if (!udev->slot_id) {
xhci_dbg(xhci, "Bad Slot ID %d\n", udev->slot_id);
@@ -2898,6 +2906,7 @@ int xhci_address_device(struct usb_hcd *
xhci_dbg_ctx(xhci, virt_dev->in_ctx, 2);
 
spin_lock_irqsave(>lock, flags);
+   cmd_trb = xhci->cmd_ring->dequeue;
ret = xhci_queue_address_device(xhci, virt_dev->in_ctx->dma,
udev->slot_id);
if (ret) {
@@ -2910,7 +2919,7 @@ int xhci_address_device(struct usb_hcd *
 
/* ctrl tx can take up to 5 sec; XXX: need more time for xHC? */
timeleft = wait_for_completion_interruptible_timeout(>addr_dev,
-   USB_CTRL_SET_TIMEOUT);
+   XHCI_CMD_DEFAULT_TIMEOUT);
/* FIXME: From section 4.3.4: "Software shall be responsible for timing
 * the SetAddress() "recovery interval" required by USB and aborting the
 * command on a timeout.
@@ -2918,7 +2927,10 @@ int xhci_address_device(struct usb_hcd *
if (timeleft <= 0) {
xhci_warn(xhci, "%s while waiting for a slot\n",
timeleft == 0 ? "Timeout" : "Signal");
-   /* FIXME cancel the 

[ 21/31] RDS: fix rds-ping spinlock recursion

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--


From: "jeff.liu" 

[ Upstream commit 5175a5e76bbdf20a614fb47ce7a38f0f39e70226 ]

This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter 
CC: Venkat Venkatsubra 
CC: David S. Miller 
CC: James Morris 
Signed-off-by: Jie Liu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/rds/send.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1121,7 +1121,7 @@ rds_send_pong(struct rds_connection *con
rds_stats_inc(s_send_pong);
 
if (!test_bit(RDS_LL_SEND_FULL, >c_flags))
-   rds_send_xmit(conn);
+   queue_delayed_work(rds_wq, >c_send_w, 0);
 
rds_message_put(rm);
return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 31/31] xHCI: handle command after aborting the command ring

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Elric Fu 

commit b63f4053cc8aa22a98e3f9a97845afe6c15d0a0d upstream.

According to xHCI spec section 4.6.1.1 and section 4.6.1.2,
after aborting a command on the command ring, xHC will
generate a command completion event with its completion
code set to Command Ring Stopped at least. If a command is
currently executing at the time of aborting a command, xHC
also generate a command completion event with its completion
code set to Command Abort. When the command ring is stopped,
software may remove, add, or rearrage Command Descriptors.

To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci. When the command ring is stopped,
software will find the command trbs described by command
descriptors in cancel_cmd_list and modify it to No Op
command. If software can't find the matched trbs, we can
think it had been finished.

This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.

Note from Sarah: The TRB_TYPE_LINK_LE32 macro is not in the 3.0 stable
kernel, so I added it to this patch.

Signed-off-by: Elric Fu 
Signed-off-by: Sarah Sharp 
Tested-by: Miroslav Sabljic 
Signed-off-by: Greg Kroah-Hartman 


---
 drivers/usb/host/xhci-ring.c |  171 +--
 drivers/usb/host/xhci.h  |3 
 2 files changed, 168 insertions(+), 6 deletions(-)

--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1157,6 +1157,20 @@ static void handle_reset_ep_completion(s
}
 }
 
+/* Complete the command and detele it from the devcie's command queue.
+ */
+static void xhci_complete_cmd_in_cmd_wait_list(struct xhci_hcd *xhci,
+   struct xhci_command *command, u32 status)
+{
+   command->status = status;
+   list_del(>cmd_list);
+   if (command->completion)
+   complete(command->completion);
+   else
+   xhci_free_command(xhci, command);
+}
+
+
 /* Check to see if a command in the device's command queue matches this one.
  * Signal the completion or free the command, and return 1.  Return 0 if the
  * completed command isn't at the head of the command list.
@@ -1175,15 +1189,144 @@ static int handle_cmd_in_cmd_wait_list(s
if (xhci->cmd_ring->dequeue != command->command_trb)
return 0;
 
-   command->status = GET_COMP_CODE(le32_to_cpu(event->status));
-   list_del(>cmd_list);
-   if (command->completion)
-   complete(command->completion);
-   else
-   xhci_free_command(xhci, command);
+   xhci_complete_cmd_in_cmd_wait_list(xhci, command,
+   GET_COMP_CODE(le32_to_cpu(event->status)));
return 1;
 }
 
+/*
+ * Finding the command trb need to be cancelled and modifying it to
+ * NO OP command. And if the command is in device's command wait
+ * list, finishing and freeing it.
+ *
+ * If we can't find the command trb, we think it had already been
+ * executed.
+ */
+static void xhci_cmd_to_noop(struct xhci_hcd *xhci, struct xhci_cd *cur_cd)
+{
+   struct xhci_segment *cur_seg;
+   union xhci_trb *cmd_trb;
+   u32 cycle_state;
+
+   if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue)
+   return;
+
+   /* find the current segment of command ring */
+   cur_seg = find_trb_seg(xhci->cmd_ring->first_seg,
+   xhci->cmd_ring->dequeue, _state);
+
+   /* find the command trb matched by cd from command ring */
+   for (cmd_trb = xhci->cmd_ring->dequeue;
+   cmd_trb != xhci->cmd_ring->enqueue;
+   next_trb(xhci, xhci->cmd_ring, _seg, _trb)) {
+   /* If the trb is link trb, continue */
+   if (TRB_TYPE_LINK_LE32(cmd_trb->generic.field[3]))
+   continue;
+
+   if (cur_cd->cmd_trb == cmd_trb) {
+
+   /* If the command in device's command list, we should
+* finish it and free the command structure.
+*/
+   if (cur_cd->command)
+   xhci_complete_cmd_in_cmd_wait_list(xhci,
+   cur_cd->command, COMP_CMD_STOP);
+
+   /* get cycle state from the origin command trb */
+   cycle_state = le32_to_cpu(cmd_trb->generic.field[3])
+   & TRB_CYCLE;
+
+   /* modify the command trb to NO OP command */
+   cmd_trb->generic.field[0] = 0;
+   cmd_trb->generic.field[1] = 0;
+   

[ 03/31] oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Dan Carpenter 

commit 44009105081b51417f311f4c3be0061870b6b8ed upstream.

The "event" variable is a u16 so the shift will always wrap to zero
making the line a no-op.

Signed-off-by: Dan Carpenter 
Signed-off-by: Robert Richter 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/oprofile/nmi_int.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -55,7 +55,7 @@ u64 op_x86_get_ctrl(struct op_x86_model_
val |= counter_config->extra;
event &= model->event_mask ? model->event_mask : 0xFF;
val |= event & 0xFF;
-   val |= (event & 0x0F00) << 24;
+   val |= (u64)(event & 0x0F00) << 24;
 
return val;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 19/31] net: Fix skb_under_panic oops in neigh_resolve_output

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--


From: "ramesh.naga...@gmail.com" 

[ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ]

The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.

Signed-off-by: Ramesh Nagappa 
Reviewed-by: Shawn Lu 
Reviewed-by: Robert Coulson 
Reviewed-by: Billie Alsup 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/core/neighbour.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1313,8 +1313,6 @@ int neigh_resolve_output(struct sk_buff
if (!dst)
goto discard;
 
-   __skb_pull(skb, skb_network_offset(skb));
-
if (!neigh_event_send(neigh, skb)) {
int err;
struct net_device *dev = neigh->dev;
@@ -1326,6 +1324,7 @@ int neigh_resolve_output(struct sk_buff
neigh_hh_init(neigh, dst, dst->ops->protocol);
 
do {
+   __skb_pull(skb, skb_network_offset(skb));
seq = read_seqbegin(>ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
  neigh->ha, NULL, skb->len);
@@ -1358,9 +1357,8 @@ int neigh_connected_output(struct sk_buf
struct net_device *dev = neigh->dev;
unsigned int seq;
 
-   __skb_pull(skb, skb_network_offset(skb));
-
do {
+   __skb_pull(skb, skb_network_offset(skb));
seq = read_seqbegin(>ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
  neigh->ha, NULL, skb->len);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] firmware loader: introduce module parameter to customize fw search path

2012-10-25 Thread Ming Lei
This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
V1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg

 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   23 +--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..ce02744 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware.path=$CUSTOMIZED_PATH' if
+   firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..b363103 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -274,6 +274,16 @@ static const char *fw_path[] = {
"/lib/firmware"
 };
 
+static char fw_path_para[256];
+
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -313,9 +323,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
bool success = false;
char *path = __getname();
 
-   for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
+   for (i = -1; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
-   snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
+
+   if (i < 0) {
+   if (!fw_path_para[0])   /* No customized path */
+   continue;
+   snprintf(path, PATH_MAX, "%s/%s", fw_path_para,
+buf->fw_id);
+   } else {
+   snprintf(path, PATH_MAX, "%s/%s", fw_path[i],
+buf->fw_id);
+   }
 
file = filp_open(path, O_RDONLY, 0);
if (IS_ERR(file))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 22/31] tcp: resets are misrouted

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--


From: Alexey Kuznetsov 

[ Upstream commit 4c67525849e0b7f4bd4fab2487ec9e43ea52ef29 ]

After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here.  What's died that's died and correct
handling normal unbound connections is obviously a priority.

Comment to comment:
> This has few benefits:
>   1. tcp_v6_send_reset already did that.

It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.

Actually, the problem appears to be even more serious than guaranteed
loss of resets.  As reported by Sergey Soloviev , those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.

Signed-off-by: Alexey Kuznetsov 
Signed-off-by: Greg Kroah-Hartman 
---
 net/ipv4/tcp_ipv4.c |7 ---
 net/ipv6/tcp_ipv6.c |3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -651,10 +651,11 @@ static void tcp_v4_send_reset(struct soc
arg.csumoffset = offsetof(struct tcphdr, check) / 2;
arg.flags = (sk && inet_sk(sk)->transparent) ? IP_REPLY_ARG_NOSRCCHECK 
: 0;
/* When socket is gone, all binding information is lost.
-* routing might fail in this case. using iif for oif to
-* make sure we can deliver it
+* routing might fail in this case. No choice here, if we choose to 
force
+* input interface, we will misroute in case of asymmetric route.
 */
-   arg.bound_dev_if = sk ? sk->sk_bound_dev_if : inet_iif(skb);
+   if (sk)
+   arg.bound_dev_if = sk->sk_bound_dev_if;
 
net = dev_net(skb_dst(skb)->dev);
ip_send_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr,
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1060,7 +1060,8 @@ static void tcp_v6_send_response(struct
__tcp_v6_send_check(buff, , );
 
fl6.flowi6_proto = IPPROTO_TCP;
-   fl6.flowi6_oif = inet6_iif(skb);
+   if (ipv6_addr_type() & IPV6_ADDR_LINKLOCAL)
+   fl6.flowi6_oif = inet6_iif(skb);
fl6.fl6_dport = t1->dest;
fl6.fl6_sport = t1->source;
security_skb_classify_flow(skb, flowi6_to_flowi());


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 26/31] sparc64: do not clobber personality flags in sys_sparc64_personality()

2012-10-25 Thread Greg Kroah-Hartman
3.0-stable review patch.  If anyone has any objections, please let me know.

--


From: Jiri Kosina 

[ Upstream commit a27032eee8cb6e16516f13c8a9752e9d5d4cc430 ]

There are multiple errors in how sys_sparc64_personality() handles
personality flags stored in top three bytes.

- directly comparing current->personality against PER_LINUX32 doesn't work
  in cases when any of the personality flags stored in the top three bytes
  are used.
- directly forcefully setting personality to PER_LINUX32 or PER_LINUX
  discards any flags stored in the top three bytes

Fix the first one by properly using personality() macro to compare only
PER_MASK bytes.
Fix the second one by setting only the bits that should be set, instead of
overwriting the whole value.

Signed-off-by: Jiri Kosina 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/sparc/kernel/sys_sparc_64.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -519,12 +519,12 @@ SYSCALL_DEFINE1(sparc64_personality, uns
 {
int ret;
 
-   if (current->personality == PER_LINUX32 &&
-   personality == PER_LINUX)
-   personality = PER_LINUX32;
+   if (personality(current->personality) == PER_LINUX32 &&
+   personality(personality) == PER_LINUX)
+   personality |= PER_LINUX32;
ret = sys_personality(personality);
-   if (ret == PER_LINUX32)
-   ret = PER_LINUX;
+   if (personality(ret) == PER_LINUX32)
+   ret &= ~PER_LINUX32;
 
return ret;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >