Re: [PATCH 6/9] uprobes: flush cache after xol write
On Thu, Oct 25, 2012 at 04:58:39PM +0200, Oleg Nesterov wrote: > On 10/16, Rabin Vincent wrote: > > > > 2012/10/15 Oleg Nesterov : > > > On 10/14, Rabin Vincent wrote: > > >> Flush the cache so that the instructions written to the XOL area are > > >> visible. > > >> > > >> Signed-off-by: Rabin Vincent > > >> --- > > >> kernel/events/uprobes.c |1 + > > >> 1 file changed, 1 insertion(+) > > >> > > >> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > >> index ca000a9..8c52f93 100644 > > >> --- a/kernel/events/uprobes.c > > >> +++ b/kernel/events/uprobes.c > > >> @@ -1246,6 +1246,7 @@ static unsigned long xol_get_insn_slot(struct > > >> uprobe *uprobe, unsigned long slot > > >> offset = current->utask->xol_vaddr & ~PAGE_MASK; > > >> vaddr = kmap_atomic(area->page); > > >> arch_uprobe_xol_copy(>arch, vaddr + offset); > > >> + flush_dcache_page(area->page); > > >> kunmap_atomic(vaddr); > > > > > > I agree... but why under kmap_atomic? > > > > No real reason; I'll move it to after the unmap. > > OK. I assume you will send v2. > > But this patch looks like a bugfix, flush_dcache_page() is not a nop > on powerpc. So perhaps we should apply this fix right now? Starting Power5, all Power processers have coherent caches. > OTOH, I do not understand this stuff, everything is nop on x86. And > when I look into Documentation/cachetlb.txt I am starting to think > that may be this needs flush_icache_user_range instead? > > Rabin, Ananth could you clarify this? Yes. We need flush_icache_user_range(). Though for x86 its always been a nop, one never knows if there is some Power4 or older machine out there that is still being used. We are fine for Power5 and later. Ananth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] perf: Make perf build for x86 with UAPI disintegration applied
On Fri, 19 Oct 2012 17:56:24 +0100, David Howells wrote: > Make perf build for x86 once the UAPI disintegration patches for that arch > have been applied by adding the appropriate -I flags - in the right order - > and then converting some #includes that use ../.. notation to find main kernel > headerfiles to use and instead. Looks nice. > > Note that -Iarch/foo/include/uapi is present _before_ -Iarch/foo/include. > This makes sure we get the userspace version of the pt_regs struct. Ideally, > we wouldn't have the latter -I flag at all, but unfortunately we want > asm/svm.h and asm/vmx.h in buildin-kvm.c and these aren't part of the UAPI - > at least not for x86. I wonder if the bits outside of the __KERNEL__ guards > *should* be transferred there. What about asm/kvm.h? Is it a part of the UAPI? > > I note also that perf seems to do its dependency handling manually by listing > all the header files it might want to use in LIB_H in the Makefile. Can this > be changed to use -MD? Yeah, that part could be improved, probably with -MMD. > > Signed-off-by: David Howells > --- > > tools/perf/Makefile | 16 +++- > tools/perf/builtin-kvm.c |6 +++--- > tools/perf/perf.h| 16 +++- > 3 files changed, 21 insertions(+), 17 deletions(-) > > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index f7c968a..9024a42 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -169,7 +169,21 @@ endif > > ### --- END CONFIGURATION SECTION --- > > -BASIC_CFLAGS = -Iutil/include -Iarch/$(ARCH)/include -I$(OUTPUT)util > -I$(TRACE_EVENT_DIR) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 > -D_GNU_SOURCE > +ifeq ($(srctree),) > +srctree := $(shell pwd) > +endif Isn't the srctree intended to point to kernel root? Also you missed to define the objtree which used below. > + > +BASIC_CFLAGS = \ > + -Iutil/include \ > + -Iarch/$(ARCH)/include \ > + -I$(objtree)/arch/$(ARCH)/include/generated/uapi \ > + -I$(srctree)/arch/$(ARCH)/include/uapi \ > + -I$(srctree)/arch/$(ARCH)/include \ > + -I$(objtree)/include/generated/uapi \ > + -I$(srctree)/include/uapi \ > + -I$(OUTPUT)util \ > + -I$(TRACE_EVENT_DIR) \ > + -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE This isn't bad, but using '+=' looks more natural IMHO. BASIC_CFLAGS = -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE BASIC_CFLAGS += -Iutil/include BASIC_CFLAGS += -Iarch/$(ARCH)/include ... > BASIC_LDFLAGS = > > # Guard against environment variables > diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c > index 260abc5..e013bdb 100644 > --- a/tools/perf/builtin-kvm.c > +++ b/tools/perf/builtin-kvm.c > @@ -22,9 +22,9 @@ > #include > #include > > -#include "../../arch/x86/include/asm/svm.h" > -#include "../../arch/x86/include/asm/vmx.h" > -#include "../../arch/x86/include/asm/kvm.h" > +#include > +#include > +#include > > struct event_key { > #define INVALID_KEY (~0ULL) > diff --git a/tools/perf/perf.h b/tools/perf/perf.h > index 2762877..238f923 100644 > --- a/tools/perf/perf.h > +++ b/tools/perf/perf.h > @@ -5,8 +5,9 @@ struct winsize; > > void get_term_dimensions(struct winsize *ws); > > +#include > + > #if defined(__i386__) > -#include "../../arch/x86/include/asm/unistd.h" > #define rmb()asm volatile("lock; addl $0,0(%%esp)" ::: > "memory") > #define cpu_relax() asm volatile("rep; nop" ::: "memory"); > #define CPUINFO_PROC "model name" > @@ -16,7 +17,6 @@ void get_term_dimensions(struct winsize *ws); > #endif > > #if defined(__x86_64__) > -#include "../../arch/x86/include/asm/unistd.h" > #define rmb()asm volatile("lfence" ::: "memory") > #define cpu_relax() asm volatile("rep; nop" ::: "memory"); > #define CPUINFO_PROC "model name" > @@ -26,20 +26,17 @@ void get_term_dimensions(struct winsize *ws); > #endif > > #ifdef __powerpc__ > -#include "../../arch/powerpc/include/asm/unistd.h" > #define rmb()asm volatile ("sync" ::: "memory") > #define cpu_relax() asm volatile ("" ::: "memory"); > #define CPUINFO_PROC "cpu" > #endif > > #ifdef __s390__ > -#include "../../arch/s390/include/asm/unistd.h" > #define rmb()asm volatile("bcr 15,0" ::: "memory") > #define cpu_relax() asm volatile("" ::: "memory"); > #endif > > #ifdef __sh__ > -#include "../../arch/sh/include/asm/unistd.h" > #if defined(__SH4A__) || defined(__SH5__) > # define rmb() asm volatile("synco" ::: "memory") > #else > @@ -50,35 +47,30 @@ void get_term_dimensions(struct winsize *ws); > #endif > > #ifdef __hppa__ > -#include "../../arch/parisc/include/asm/unistd.h" > #define rmb()asm volatile("" ::: "memory") > #define cpu_relax() asm volatile("" ::: "memory"); > #define CPUINFO_PROC "cpu" > #endif > > #ifdef __sparc__ > -#include "../../arch/sparc/include/asm/unistd.h" It might conflict with davem's
Re: linux-next: build warnings after merge of the akpm tree
Am Freitag, den 26.10.2012, 06:36 +0800 schrieb Richard Yang: > > > >And holy cow that code is hard to read :( Why was kfifo_in() > >implemented as a macro, anyway? AFAICT all its args have a known type, > >so we could have used a proper C interface, which would have fixed all > >this nicely. > Thats simple for performance reasons, the compiler remove most of the code during the compile stage, so no runtime checks are necessary. And it is the only way since C does not provides templates like C++. > Hmm, move the definition of kfifo_in()/kfifo_out() into the kfifo.c? > Don't do it. this will result in a performance degradation. Look at the disassembled code by each change in code and compare it with the previous one. I don't believe that you can produce better code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.
Hi Toshi, On 10/26/2012 01:20 AM, Toshi Kani wrote: ... Why do you need to call acpi_bus_trim(device,0) to stop the container device first? This issue was introduced by Lu Yinghai, I think he could give a better answer than me. :) Please refer to the following url: http://www.spinics.net/lists/linux-pci/msg17667.html However, this is not applied into the pci tree yet. We have worked out a patch set to clean up the logic for PCI/ACPI binding relationship. It updates PCI/ACPI binding relationship by registering bus notification onto pci_bus_type instead of hooking into the ACPI/glue.c. Thanks for the info and pointer. Tang, I'd suggest you add such info to the comment so that others know that this step is needed for removing PCI bridges. It helps us to know where to look at... OK, I'll add it in the next version. :) To accommodate that patch set, the ACPI device destroy process has been split into two steps: 1) acpi_bus_trim(device,0) to unbind ACPI drivers Does this step also detach PCI drivers from PCI cards as well? Yes, it calls device_release_driver() to release the device driver. device_release_driver() |->__device_release_driver() |->dev->driver = NULL; Thanks. :) Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kfifo: remove unnecessary type check
Am Freitag, den 26.10.2012, 09:46 +0800 schrieb Yuanhan Liu: > From: Yuanhan Liu > > Firstly, this kind of type check doesn't work. It does something similay > like following: > void * __dummy = NULL; > __buf = __dummy; > > __dummy is defined as void *. Thus it will not trigger warnings as > expected. > > Second, we don't need that kind of check. Since the prototype > of __kfifo_out is: > unsigned int __kfifo_out(struct __kfifo *fifo, void *buf, unsigned int > len) > > buf is defined as void *, so we don't need do the type check. Remove it. > > LINK: https://lkml.org/lkml/2012/10/25/386 > LINK: https://lkml.org/lkml/2012/10/25/584 > > Cc: Andrew Morton > Cc: Wei Yang > Cc: Stefani Seibold > Cc: Fengguang Wu > Cc: Stephen Rothwell > Signed-off-by: Yuanhan Liu > --- > include/linux/kfifo.h | 20 > 1 file changed, 20 deletions(-) > > diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h > index 10308c6..b8c1d03 100644 > --- a/include/linux/kfifo.h > +++ b/include/linux/kfifo.h > @@ -390,10 +390,6 @@ __kfifo_int_must_check_helper( \ > unsigned int __ret; \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) { \ > - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \ > - __dummy = (typeof(__val))NULL; \ > - } \ > if (__recsize) \ > __ret = __kfifo_in_r(__kfifo, __val, sizeof(*__val), \ > __recsize); \ > @@ -432,8 +428,6 @@ __kfifo_uint_must_check_helper( \ > unsigned int __ret; \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) \ > - __val = (typeof(__tmp->ptr))0; \ > if (__recsize) \ > __ret = __kfifo_out_r(__kfifo, __val, sizeof(*__val), \ > __recsize); \ > @@ -473,8 +467,6 @@ __kfifo_uint_must_check_helper( \ > unsigned int __ret; \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) \ > - __val = (typeof(__tmp->ptr))NULL; \ > if (__recsize) \ > __ret = __kfifo_out_peek_r(__kfifo, __val, sizeof(*__val), \ > __recsize); \ > @@ -512,10 +504,6 @@ __kfifo_uint_must_check_helper( \ > unsigned long __n = (n); \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) { \ > - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \ > - __dummy = (typeof(__buf))NULL; \ > - } \ > (__recsize) ?\ > __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \ > __kfifo_in(__kfifo, __buf, __n); \ > @@ -565,10 +553,6 @@ __kfifo_uint_must_check_helper( \ > unsigned long __n = (n); \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) { \ > - typeof(__tmp->ptr) __dummy = NULL; \ > - __buf = __dummy; \ > - } \ > (__recsize) ?\ > __kfifo_out_r(__kfifo, __buf, __n, __recsize) : \ > __kfifo_out(__kfifo, __buf, __n); \ > @@ -777,10 +761,6 @@ __kfifo_uint_must_check_helper( \ > unsigned long __n = (n); \ > const size_t __recsize = sizeof(*__tmp->rectype); \ > struct __kfifo *__kfifo = &__tmp->kfifo; \ > - if (0) { \ > - typeof(__tmp->ptr) __dummy __attribute__ ((unused)) = NULL; \ > - __buf = __dummy; \ > - } \ > (__recsize) ? \ > __kfifo_out_peek_r(__kfifo, __buf, __n, __recsize) : \ > __kfifo_out_peek(__kfifo, __buf, __n); \ Did you tried to compile the whole kernel including all the drivers with your patch? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] tools: Honour the O= flag when tool build called from a higher Makefile
On Fri, 19 Oct 2012 17:56:16 +0100, David Howells wrote: > Honour the O= flag that was passed to a higher level Makefile and then passed > down as part of a tool build. > > To make this work, the top-level Makefile passes the original O= flag and > subdir=tools to the tools/Makefile, and that in turn passes > subdir=$(O)/$(subdir)/foodir when building tool foo in directory > $(O)/$(subdir)/foodir (where the intervening slashes aren't added if an > element is missing). > > For example, take perf. This is found in tools/perf/. Assume we're building > into directory ~/zebra/, so we pass O=~/zebra to make. Dependening on where > we run the build from, we see: > > make run in dir $(OUTPUT) dir > === == > linux ~/zebra/tools/perf/ > linux/tools ~/zebra/perf/ > linux/tools/perf~/zebra/ > > and if O= is not set, we get: > > make run in dir $(OUTPUT) dir > === == > linux linux/tools/perf/ > linux/tools linux/tools/perf/ > linux/tools/perflinux/tools/perf/ > > The output directories are created by the descend function if they don't > already exist. This is my test: namhyung@sejong:~$ cd project/linux namhyung@sejong:linux$ make O=~/build/zebra tools/perf /bin/sh: line 0: cd: /home/namhyung/build/zebra: No such file or directory Makefile:121: *** output directory "/home/namhyung/build/zebra" does not exist. Stop. namhyung@sejong:tools$ mkdir ~/build/zebra namhyung@sejong:linux$ make O=~/build/zebra tools/perf HOSTCC scripts/basic/fixdep GEN /home/namhyung/build/zebra/Makefile HOSTCC scripts/kconfig/conf.o HOSTCC scripts/kconfig/zconf.tab.o HOSTLD scripts/kconfig/conf scripts/kconfig/conf --silentoldconfig Kconfig *** *** Configuration file ".config" not found! *** *** Please run some configurator (e.g. "make oldconfig" or *** "make menuconfig" or "make xconfig"). *** make[3]: *** [silentoldconfig] Error 1 make[2]: *** [silentoldconfig] Error 2 DESCEND perf MKDIR /home/namhyung/build/zebra/tools/perf/arch/ MKDIR /home/namhyung/build/zebra/tools/perf/arch/x86/util/ MKDIR /home/namhyung/build/zebra/tools/perf/bench/ MKDIR /home/namhyung/build/zebra/tools/perf/scripts/perl/Perf-Trace-Util/ MKDIR /home/namhyung/build/zebra/tools/perf/scripts/python/Perf-Trace-Util/ MKDIR /home/namhyung/build/zebra/tools/perf/ui/ MKDIR /home/namhyung/build/zebra/tools/perf/ui/browsers/ MKDIR /home/namhyung/build/zebra/tools/perf/ui/gtk/ MKDIR /home/namhyung/build/zebra/tools/perf/ui/stdio/ MKDIR /home/namhyung/build/zebra/tools/perf/ui/tui/ MKDIR /home/namhyung/build/zebra/tools/perf/util/ MKDIR /home/namhyung/build/zebra/tools/perf/util/scripting-engines/ PERF_VERSION = 3.7.rc2.1655.g54fa2b.dirty GEN /home/namhyung/build/zebra/tools/perf/common-cmds.h * new build flags or prefix CC /home/namhyung/build/zebra/tools/perf/perf.o ... This looks ok but it'd be better if we can skip the config check when building tools IMHO. namhyung@sejong:linux cd tools namhyung@sejong:tools$ make O=~/build/zebra perf DESCEND perf ... * new build flags or prefix CC /home/namhyung/build/zebra/perf.o ... This looks not good as it doesn't build perf into ~/build/zebra/perf/perf.o. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 027/193] arch/sh: remove CONFIG_EXPERIMENTAL
On Thu, Oct 25, 2012 at 9:28 PM, Paul Mundt wrote: > On Tue, Oct 23, 2012 at 01:01:40PM -0700, Kees Cook wrote: >> This config item has not carried much meaning for a while now and is >> almost always enabled by default. As agreed during the Linux kernel >> summit, remove it. >> >> CC: Paul Mundt >> CC: Tejun Heo >> Signed-off-by: Kees Cook > > While there are cases where it is largely superfluous, we also have > plenty of cases in here that are genuinely experimental features and > generally shouldn't be enabled unless someone is prepared for some > hacking. We can of course replace this with an arch-specific option if > needed, but I disagree with suddenly making experimental features > suddenly appear to be anything other than what they are. Yeah, things that really are experimental need something, but it hasn't been meaningful to put them behind CONFIG_EXPERIMENTAL. Here's the text from the first patch, which details possible approaches: https://lkml.org/lkml/2012/10/23/878 This config item has not carried much meaning for a while now and is almost always enabled by default (especially in distro builds). As agreed during the Linux kernel summit, it should be removed. As a first step, remove it from being listed, and default it to on. Once it has been removed from all subsystem Kconfigs, it will be dropped entirely. For items that really are experimental, maintainers should use "default n", optionally include "(EXPERIMENTAL)" in the title, and add language to the help text indicating why the item should be considered experimental. For items that are dangerously experimental, the maintainer is encouraged to follow the above title recommendation, add stronger language to the help text, and optionally use (depending on the extent of the danger, from least to most dangerous): printk(), add_taint(TAINT_WARN), add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PCI/PM: Add comments for PME poll support for PCIe
There are comments on why PME poll support is necessary for PCI devices, but not for PCIe devices. That may lead to misunderstanding that PME poll is only necessary for PCI devices. So add comments related to PCIe PME poll to make it more clear. The content of comments comes from the changelog of commit: 379021d5c0899fcf9410cae4ca7a59a5a94ca769 Cc: Rafael J. Wysocki Signed-off-by: Huang Ying --- drivers/pci/pci.c | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1578,15 +1578,25 @@ void pci_pme_active(struct pci_dev *dev, pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); - /* PCI (as opposed to PCIe) PME requires that the device have - its PME# line hooked up correctly. Not all hardware vendors - do this, so the PME never gets delivered and the device - remains asleep. The easiest way around this is to - periodically walk the list of suspended devices and check - whether any have their PME flag set. The assumption is that - we'll wake up often enough anyway that this won't be a huge - hit, and the power savings from the devices will still be a - win. */ + /* +* PCI (as opposed to PCIe) PME requires that the device have +* its PME# line hooked up correctly. Not all hardware vendors +* do this, so the PME never gets delivered and the device +* remains asleep. The easiest way around this is to +* periodically walk the list of suspended devices and check +* whether any have their PME flag set. The assumption is that +* we'll wake up often enough anyway that this won't be a huge +* hit, and the power savings from the devices will still be a +* win. +* +* Although PCIe uses in-band PME message instead of PME# line +* to report PME, PME does not work for some PCIe devices in +* reality. For example, there are devices that set their PME +* status bits, but don't really bother to send a PME message; +* there are PCI Express Root Ports that don't bother to +* trigger interrupts when they receive PME messages from the +* devices below. So PME poll is used for PCIe devices too. +*/ if (dev->pme_poll) { struct pci_pme_device *pme_dev; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 11:55 AM, Fengguang Wu wrote: > On Fri, Oct 26, 2012 at 11:38:11AM +0800, YingHang Zhu wrote: >> On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner wrote: >> > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: >> >> Hi Chen, >> >> >> >> > But how can bdi related ra_pages reflect different files' readahead >> >> > window? Maybe these different files are sequential read, random read >> >> > and so on. >> >> >> >> It's simple: sequential reads will get ra_pages readahead size while >> >> random reads will not get readahead at all. >> >> >> >> Talking about the below chunk, it might hurt someone that explicitly >> >> takes advantage of the behavior, however the ra_pages*2 seems more >> >> like a hack than general solution to me: if the user will need >> >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for >> >> improving IO performance, then why not just increase bdi->ra_pages and >> >> benefit all reads? One may argue that it offers some differential >> >> behavior to specific applications, however it may also present as a >> >> counter-optimization: if the root already tuned bdi->ra_pages to the >> >> optimal size, the doubled readahead size will only cost more memory >> >> and perhaps IO latency. >> >> >> >> --- a/mm/fadvise.c >> >> +++ b/mm/fadvise.c >> >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, >> >> loff_t len, int advice) >> >> spin_unlock(>f_lock); >> >> break; >> >> case POSIX_FADV_SEQUENTIAL: >> >> - file->f_ra.ra_pages = bdi->ra_pages * 2; >> > >> > I think we really have to reset file->f_ra.ra_pages here as it is >> > not a set-and-forget value. e.g. shrink_readahead_size_eio() can >> > reduce ra_pages as a result of IO errors. Hence if you have had io >> > errors, telling the kernel that you are now going to do sequential >> > IO should reset the readahead to the maximum ra_pages value >> > supported >> If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone >> device's readahead can be directly tuned or turned off with blockdev >> thus affect all files >> using the device and without bring more complexity... > > It's not really feasible/convenient for the end users to hand tune > blockdev readahead size on IO errors. Even many administrators are > totally unaware of the readahead size parameter. You are right, so the problem comes in this way: If one file's read failure will affect other files? I mean for rotating disks and discs, a file's read failure may be due to the bad sectors which tend to be consecutive and won't affect other files' reading status. However for tape drive the read failure usually indicates data corruption and other file's reading may also fail. In other words, should we consider how many files failed to read data and where they failed as a factor to indicate the status of the backing device, or treat these files independently? If we choose the previous one we can accumulate the statistics and change bdi.ra_pages, otherwise we may do some check for FMODE_RANDOM before we change the readahead window. I may missed something, please point it out. Thanks, Ying Zhu > > Thanks, > Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2 v4] block/throttle: Add IO submitted information in blkio.throttle
From: Robin Dong Currently, if the IO is throttled by io-throttle, the system admin has no idea of the situation and can't report it to the real application user about that he/she has to do something. So this patch adds a new interface named blkio.throttle.io_submitted which exposes the number of bios that have been sent into blk-throttle therefore the user could calculate the difference from throttle.io_serviced to see how many IOs are currently throttled. Cc: Tejun Heo Cc: Vivek Goyal Cc: Jens Axboe Signed-off-by: Tao Ma Signed-off-by: Robin Dong --- v3 <-- v2: - Use nr-queued[] of struct throtl_grp for stats instaed of adding new blkg_rwstat. v4 <-- v3: - Add two new blkg_rwstat arguments to count total bios be sent into blk_throttle. block/blk-throttle.c | 43 +++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 46ddeff..c6391b5 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -46,6 +46,10 @@ struct tg_stats_cpu { struct blkg_rwstat service_bytes; /* total IOs serviced, post merge */ struct blkg_rwstat serviced; + /* total bytes submitted into blk-throttle */ + struct blkg_rwstat submit_bytes; + /* total IOs submitted into blk-throttle */ + struct blkg_rwstat submitted; }; struct throtl_grp { @@ -266,6 +270,8 @@ static void throtl_pd_reset_stats(struct blkcg_gq *blkg) blkg_rwstat_reset(>service_bytes); blkg_rwstat_reset(>serviced); + blkg_rwstat_reset(>submit_bytes); + blkg_rwstat_reset(>submitted); } } @@ -699,6 +705,30 @@ static void throtl_update_dispatch_stats(struct throtl_grp *tg, u64 bytes, local_irq_restore(flags); } +static void throtl_update_submit_stats(struct throtl_grp *tg, u64 bytes, int rw) +{ + struct tg_stats_cpu *stats_cpu; + unsigned long flags; + + /* If per cpu stats are not allocated yet, don't do any accounting. */ + if (tg->stats_cpu == NULL) + return; + + /* +* Disabling interrupts to provide mutual exclusion between two +* writes on same cpu. It probably is not needed for 64bit. Not +* optimizing that case yet. +*/ + local_irq_save(flags); + + stats_cpu = this_cpu_ptr(tg->stats_cpu); + + blkg_rwstat_add(_cpu->submitted, rw, 1); + blkg_rwstat_add(_cpu->submit_bytes, rw, bytes); + + local_irq_restore(flags); +} + static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) { bool rw = bio_data_dir(bio); @@ -1084,6 +1114,16 @@ static struct cftype throtl_files[] = { .private = offsetof(struct tg_stats_cpu, serviced), .read_seq_string = tg_print_cpu_rwstat, }, + { + .name = "throttle.io_submit_bytes", + .private = offsetof(struct tg_stats_cpu, submit_bytes), + .read_seq_string = tg_print_cpu_rwstat, + }, + { + .name = "throttle.io_submitted", + .private = offsetof(struct tg_stats_cpu, submitted), + .read_seq_string = tg_print_cpu_rwstat, + }, { } /* terminate */ }; @@ -1128,6 +1168,8 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio) if (tg_no_rule_group(tg, rw)) { throtl_update_dispatch_stats(tg, bio->bi_size, bio->bi_rw); + throtl_update_submit_stats(tg, + bio->bi_size, bio->bi_rw); goto out_unlock_rcu; } } @@ -1141,6 +1183,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio) if (unlikely(!tg)) goto out_unlock; + throtl_update_submit_stats(tg, bio->bi_size, bio->bi_rw); if (tg->nr_queued[rw]) { /* * There is already another bio queued in same dir. No -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2 v4] block/throttle: remove redundant type transition
From: Robin Dong We don't need to convert tg to blkg and then convert it back in throtl_update_dispatch_stats(). Signed-off-by: Robin Dong --- block/blk-throttle.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index a9664fa..46ddeff 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -674,10 +674,9 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg, return 0; } -static void throtl_update_dispatch_stats(struct blkcg_gq *blkg, u64 bytes, +static void throtl_update_dispatch_stats(struct throtl_grp *tg, u64 bytes, int rw) { - struct throtl_grp *tg = blkg_to_tg(blkg); struct tg_stats_cpu *stats_cpu; unsigned long flags; @@ -708,7 +707,7 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) tg->bytes_disp[rw] += bio->bi_size; tg->io_disp[rw]++; - throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw); + throtl_update_dispatch_stats(tg, bio->bi_size, bio->bi_rw); } static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg, @@ -1127,7 +1126,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio) tg = throtl_lookup_tg(td, blkcg); if (tg) { if (tg_no_rule_group(tg, rw)) { - throtl_update_dispatch_stats(tg_to_blkg(tg), + throtl_update_dispatch_stats(tg, bio->bi_size, bio->bi_rw); goto out_unlock_rcu; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Enable A20 using KBC for some MSI laptops to fix S3 resume
My guess is that Windows explicitly enables A20 on resume. We should do that too, really... with the current heavily unified realmode code it should be easy - let me hack up a patch in the morning. Robert Hancock wrote: >On 10/24/2012 02:09 PM, Alan Cox wrote: >> On Wed, 24 Oct 2012 12:36:04 -0700 >> "H. Peter Anvin" wrote: >> >>> Minor concern: it should do the wait for ready before sending each >command. >> >> Can we get a command line to do this quirk too - it strikes me that >if >> the MSIs rely upon it then it may be something Windows always does so >> will be useful to try on other problem machines as an experiment. > >I agree, one has to keep in mind the age-old question "how does Windows > >work?" since it surely has no such quirk. I'd say we're sometimes too >quick to add these DMI quirks when a more general solution would be >somehow figure out how the Linux behavior differs from what Windows is >doing. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
On 10/25/2012 01:47 AM, Chris Wilson wrote: On Thu, 25 Oct 2012 10:16:08 +0200, Daniel Vetter wrote: On Thu, Oct 25, 2012 at 7:22 AM, Justin P. Mattock wrote: here is a link to the file..: intel_error_decode http://www.filefactory.com/file/22bypyjhs4mx I haven't figured out how to access this thing. Can you please file a bug report on bugs.freedesktop.org and attach it there? Oops.. I filed with the kernel. maybe can just add a cc's https://bugzilla.kernel.org/show_bug.cgi?id=49571 No worries, it is another ILK hang similar to the ones reported earlier - it just seems the ring stops advancing. Hopefully it is a missing w/a from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile -Chris well if this means building libdrm etc.. then thats not a problem, more time consuming if anything. perhaps an *.rpm that I can test to see? Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Oct 26
Hi all, Changes since 201201025: The arm -soc tree gained conflicts against the gpio-lw and pinctrl trees. The akpm tree a couple of patches that turned up elsewhere. I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" as mentioned in the FAQ on the wiki (see below). You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc, sparc64 and arm defconfig. These builds also have CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and CONFIG_DEBUG_INFO disabled when necessary. Below is a summary of the state of the merge. We are up to 206 trees (counting Linus' and 27 trees of patches pending for Linus' tree), more are welcome (even if they are currently empty). Thanks to those who have contributed, and to those who haven't, please do. Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. There is a wiki covering stuff to do with linux-next at http://linux.f-seidel.de/linux-next/pmwiki/ . Thanks to Frank Seidel. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (2ab3f29 Merge branch 'akpm' (Andrew's fixes)) Merging fixes/master (12250d8 Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux) Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by list_head-style lists.) Merging arm-current/fixes (b43b1ff Merge tag 'fixes-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes) Merging m68k-current/for-linus (8a745ee m68k: Wire up kcmp) Merging powerpc-merge/merge (83dac59 cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries.) Merging sparc/master (43c422e apparmor: fix apparmor OOPS in audit_log_untrustedstring+0x1c/0x40) Merging net/master (910a578 vhost: fix mergeable bufs on BE hosts) Merging sound-current/for-linus (c64064c Merge tag 'asoc-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus) Merging pci-current/for-linus (0ff9514 PCI: Don't print anything while decoding is disabled) Merging wireless/master (f89ff64 b43: Fix oops on unload when firmware not found) Merging driver-core.current/driver-core-linus (bf34be0 Documentation:Chinese translation of Documentation/arm64/memory.txt) Merging tty.current/tty-linus (a4f7438 Revert "serial: omap: fix software flow control") Merging usb.current/usb-linus (1d63f24 Merge tag 'for-usb-linus-2012-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci into usb-linus) Merging staging.current/staging-linus (4d3f120 staging: tidspbridge: delete unused mmu functions) Merging char-misc.current/char-misc-linus (2cb55a2 sonypi: suspend/resume callbacks should be conditionally compiled on CONFIG_PM_SLEEP) Merging input-current/for-linus (88fd449 Input: wacom - add INPUT_PROP_DIRECT flag to Cintiq 24HD) Merging md-current/for-linus (72f36d5 md: refine reporting of resync/reshape delays.) Merging audit-current/for-linus (c158a35 audit: no leading space in audit_log_d_path prefix) Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption) Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops) Merging dwmw2/master (244dc4e Merge git://git.infradead.org/users/dwmw2/random-2.6) Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to inline functions) Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs formatting) Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for of_parse_phandle_with_args) Merging spi-current/spi/merge (d1c185b of/spi: Fix SPI module loading by using proper "spi:" modalias prefixes.) Merging gpio-current/gpio/merge (96b7064 gpio/tca6424: merge I2C transactions, remove cast) Merging asm-generic/master (9b04ebd asm-generic/io.h: remove asm/cacheflush.h include) Merging arm/for-next (4a83d2e Merge remote-tracking branch
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 11:51 AM, Ni zhan Chen wrote: > On 10/26/2012 11:28 AM, YingHang Zhu wrote: >> >> On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen >> wrote: >>> >>> On 10/26/2012 09:27 AM, Fengguang Wu wrote: On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote: > > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: >> >> Hi Chen, >> >>> But how can bdi related ra_pages reflect different files' readahead >>> window? Maybe these different files are sequential read, random read >>> and so on. >> >> It's simple: sequential reads will get ra_pages readahead size while >> random reads will not get readahead at all. >> >> Talking about the below chunk, it might hurt someone that explicitly >> takes advantage of the behavior, however the ra_pages*2 seems more >> like a hack than general solution to me: if the user will need >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for >> improving IO performance, then why not just increase bdi->ra_pages and >> benefit all reads? One may argue that it offers some differential >> behavior to specific applications, however it may also present as a >> counter-optimization: if the root already tuned bdi->ra_pages to the >> optimal size, the doubled readahead size will only cost more memory >> and perhaps IO latency. >> >> --- a/mm/fadvise.c >> +++ b/mm/fadvise.c >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, >> loff_t len, int advice) >> spin_unlock(>f_lock); >> break; >> case POSIX_FADV_SEQUENTIAL: >> - file->f_ra.ra_pages = bdi->ra_pages * 2; > > I think we really have to reset file->f_ra.ra_pages here as it is > not a set-and-forget value. e.g. shrink_readahead_size_eio() can > reduce ra_pages as a result of IO errors. Hence if you have had io > errors, telling the kernel that you are now going to do sequential > IO should reset the readahead to the maximum ra_pages value > supported Good point! but wait this patch removes file->f_ra.ra_pages in all other places too, so there will be no file->f_ra.ra_pages to be reset here... >>> >>> >>> In his patch, >>> >>> >>> static void shrink_readahead_size_eio(struct file *filp, >>> struct file_ra_state *ra) >>> { >>> - ra->ra_pages /= 4; >>> + spin_lock(>f_lock); >>> + filp->f_mode |= FMODE_RANDOM; >>> + spin_unlock(>f_lock); >>> >>> As the example in comment above this function, the read maybe still >>> sequential, and it will waste IO bandwith if modify to FMODE_RANDOM >>> directly. >> >> I've considered about this. On the first try I modified file_ra_state.size >> and >> file_ra_state.async_size directly, like >> >> file_ra_state.async_size = 0; >> file_ra_state.size /= 4; >> >> but as what I comment here, we can not >> predict whether the bad sectors will trash the readahead window, maybe the >> following sectors after current one are ok to go in normal readahead, >> it's hard to know, >> the current approach gives us a chance to slow down softly. > > > Then when will check filp->f_mode |= FMODE_RANDOM; ? Does it will influence > ra->ra_pages? You can find the relevant information in function page_cache_sync_readahead. Thanks, Ying Zhu > > >> >> Thanks, >> Ying Zhu Thanks, Fengguang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 0/5] tools, perf: Fix up for x86 UAPI disintegration
Hi David, On Thu, 25 Oct 2012 08:57:20 +0100, David Howells wrote: > Borislav Petkov wrote: > >> David, where can get that x86 UAPI disintegration patch? > > The tip tree has it in branch x86/uapi or you can get it from: > > git://git.infradead.org/users/dhowells/linux-headers.git > > branch disintegrate-x86 or tag disintegrate-x86-20121009. > > I've posted a couple of additional patches to deal with files that became > empty, but they're only for dealing with people who construct their kernel > sources with the patch program. I applied this series on top of you disintegrate-x86 branch which has following commit. commit 8d2c63c2b664bae1fb0f386661ea5f635330e570 Author: David Howells Date: Tue Oct 9 09:47:54 2012 +0100 UAPI: (Scripted) Disintegrate arch/x86/include/asm Signed-off-by: David Howells Acked-by: Arnd Bergmann Acked-by: Thomas Gleixner Acked-by: Michael Kerrisk Acked-by: Paul E. McKenney Acked-by: Dave Jones But I got a conflict like this: --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@@ -112,7 -102,7 +102,11 @@@ void get_term_dimensions(struct winsiz #include #include ++<<< HEAD +#include "../../include/linux/perf_event.h" ++=== + #include ++>>> perf: Make perf build for x86 with UAPI disintegration applied #include "util/types.h" #include This was because your patch 3 has "uapi" between "include" and "linux". It seems I need more patches to apply your series since there's no perf_event.h under ../../include/uapi/linux directory. Anyways, resolving the conflict resulted in build error: CC builtin-kvm.o builtin-kvm.c:25:21: fatal error: asm/svm.h: No such file or directory make: *** [builtin-kvm.o] Error 1 CC util/evsel.o In file included from util/perf_regs.h:5:0, from util/evsel.c:23: arch/x86/include/perf_regs.h:6:27: fatal error: asm/perf_regs.h: No such file or directory make: *** [util/evsel.o] Error 1 CC util/rbtree.o ../../lib/rbtree.c:24:36: fatal error: linux/rbtree_augmented.h: No such file or directory make: *** [util/rbtree.o] Error 1 CC util/header.o util/header.c:2276:8: error: ‘PERF_ATTR_SIZE_VER3’ undeclared here (not in a function) make: *** [util/header.o] Error 1 Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Enable A20 using KBC for some MSI laptops to fix S3 resume
On 10/24/2012 02:09 PM, Alan Cox wrote: On Wed, 24 Oct 2012 12:36:04 -0700 "H. Peter Anvin" wrote: Minor concern: it should do the wait for ready before sending each command. Can we get a command line to do this quirk too - it strikes me that if the MSIs rely upon it then it may be something Windows always does so will be useful to try on other problem machines as an experiment. I agree, one has to keep in mind the age-old question "how does Windows work?" since it surely has no such quirk. I'd say we're sometimes too quick to add these DMI quirks when a more general solution would be somehow figure out how the Linux behavior differs from what Windows is doing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cpuidle: add missing header include
This patch adds missing device.h header to fix build warnings as below: drivers/cpuidle/cpuidle.h:26:41: warning: 'struct device' declared inside parameter list [enabled by default] drivers/cpuidle/cpuidle.h:26:41: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/cpuidle/cpuidle.h:27:45: warning: 'struct device' declared inside parameter list [enabled by default] In file included from drivers/cpuidle/driver.c:15:0: drivers/cpuidle/cpuidle.h:26:41: warning: 'struct device' declared inside parameter list [enabled by default] drivers/cpuidle/cpuidle.h:26:41: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/cpuidle/cpuidle.h:27:45: warning: 'struct device' declared inside parameter list [enabled by default] This build warning is introduced by commit efeca1b "cpuidle / sysfs: change function parameter". Signed-off-by: Jingoo Han Cc: Daniel Lezcano --- drivers/cpuidle/cpuidle.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h index a5bbd1c..2120d9e 100644 --- a/drivers/cpuidle/cpuidle.h +++ b/drivers/cpuidle/cpuidle.h @@ -5,6 +5,8 @@ #ifndef __DRIVER_CPUIDLE_H #define __DRIVER_CPUIDLE_H +#include + /* For internal use only */ extern struct cpuidle_governor *cpuidle_curr_governor; extern struct list_head cpuidle_governors; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 027/193] arch/sh: remove CONFIG_EXPERIMENTAL
On Tue, Oct 23, 2012 at 01:01:40PM -0700, Kees Cook wrote: > This config item has not carried much meaning for a while now and is > almost always enabled by default. As agreed during the Linux kernel > summit, remove it. > > CC: Paul Mundt > CC: Tejun Heo > Signed-off-by: Kees Cook While there are cases where it is largely superfluous, we also have plenty of cases in here that are genuinely experimental features and generally shouldn't be enabled unless someone is prepared for some hacking. We can of course replace this with an arch-specific option if needed, but I disagree with suddenly making experimental features suddenly appear to be anything other than what they are. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL][PATCH] ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA
Linus, Please pull the latest ktest-v3.7-rc2 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest.git ktest-v3.7-rc2 Head SHA1: 0979976ee53697f4308578c56abedb9766fea231 Steven Rostedt (1): ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA tools/testing/ktest/ktest.pl |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) --- commit 8bc5e4ea3ea0e24142db2dc941233eab2a223ed4 Author: Steven Rostedt Date: Fri Oct 26 00:10:32 2012 -0400 ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA In order to decide if ktest should bother installing modules on the target box, it checks if the config file has CONFIG_MODULES=y. But it also checks if the '=y' part exists. It only will install modules if the config exists and is set with '=y'. But as the regex that was used tests: /^CONFIG_MODULES(=y)?/ this will also match: CONFIG_MODULES_USE_ELF_RELA as the '=y' part was optional and it did not test the rest of the line. When this happens, ktest will stop checking the rest of the configs but it will also think that no modules are needed to be installed. What it should do is only jump out of the loop if it actually found a CONFIG_MODULES that is set to true. Otherwise, ktest wont install the necessary modules needed for proper booting of the test target. Signed-off-by: Steven Rostedt diff --git a/tools/testing/ktest/ktest.pl b/tools/testing/ktest/ktest.pl index b51d787..c7ba761 100755 --- a/tools/testing/ktest/ktest.pl +++ b/tools/testing/ktest/ktest.pl @@ -1740,8 +1740,10 @@ sub install { open(IN, "$output_config") or dodie("Can't read config file"); while () { if (/CONFIG_MODULES(=y)?/) { - $install_mods = 1 if (defined($1)); - last; + if (defined($1)) { + $install_mods = 1; + last; + } } } close(IN); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()
On Thu, Oct 25, 2012 at 8:57 PM, Rik van Riel wrote: > > That may not even be needed. Apparently Intel chips > automatically flush an entry from the TLB when it > causes a page fault. I assume AMD chips do the same, > because flush_tlb_fix_spurious_fault evaluates to > nothing on x86. Yes. It's not architected as far as I know, though. But I agree, it's possible - even likely - we could avoid TLB flushing entirely on x86. If you want to try it, I would seriously suggest you do it as a separate commit though, just in case. > Are there architectures where we do need to flush > remote TLBs on upgrading the permissions on a PTE? I *suspect* that whole TLB flush just magically became an SMP one without anybody ever really thinking about it. So it's quite possible we could do this to the pgtable-generic.c code too. However, we don't actually have any generic way to do a local single-address flush (the __flush_tlb_one() thing is architecture-specific, although it exists on a few architectures). We'd need to add a local_flush_tlb_page(vma, address) function. Alternatively, we could decide to use the "tlb_fix_spurious_fault()" thing in there. Possibly just do it unconditionally in the caller - or even just specify that the fault handler has to do it. And stop returning a value at all from ptep_set_access_flags() (I *think* that's the only thing the return value gets used for - flushing the TLB on the local cpu for the cpu's that want it). > Want to just remove the TLB flush entirely and see > if anything breaks in 3.8-rc1? > > From reading the code again, it looks like things > should indeed work ok. I would be open to it, but just in case it causes bisectable problems I'd really want to see it in two patches ("make it always do the local flush" followed by "remove even the local flush"), and then it would pinpoint any need. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend V2] dt: add helper function to read u8 & u16 variables & arrays
This adds following helper routines: - of_property_read_u8_array() - of_property_read_u16_array() - of_property_read_u8() - of_property_read_u16() First two actually share most of the code with of_property_read_u32_array(), so the common part is taken out into a macro, which can be used by all three *_array() routines. Signed-off-by: Viresh Kumar --- V1->V2: - - Use typeof() in of_property_read_array() macro instead of passing type to it drivers/of/base.c | 73 +++--- include/linux/of.h | 30 ++ 2 files changed, 89 insertions(+), 14 deletions(-) diff --git a/drivers/of/base.c b/drivers/of/base.c index af3b22a..039e178 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -670,6 +670,64 @@ struct device_node *of_find_node_by_phandle(phandle handle) } EXPORT_SYMBOL(of_find_node_by_phandle); +#define of_property_read_array(_np, _pname, _out, _sz) \ + struct property *_prop = of_find_property(_np, _pname, NULL); \ + const __be32 *_val; \ + \ + if (!_prop) \ + return -EINVAL; \ + if (!_prop->value) \ + return -ENODATA;\ + if ((_sz * sizeof(*_out)) > _prop->length) \ + return -EOVERFLOW; \ + \ + _val = _prop->value;\ + while (_sz--) \ + *_out++ = (typeof(*_out))be32_to_cpup(_val++); \ + return 0; + +/** + * of_property_read_u8_array - Find and read an array of u8 from a property. + * + * @np:device node from which the property value is to be read. + * @propname: name of the property to be searched. + * @out_value: pointer to return value, modified only if return value is 0. + * + * Search for a property in a device node and read 8-bit value(s) from + * it. Returns 0 on success, -EINVAL if the property does not exist, + * -ENODATA if property does not have a value, and -EOVERFLOW if the + * property data isn't large enough. + * + * The out_value is modified only if a valid u8 value can be decoded. + */ +int of_property_read_u8_array(const struct device_node *np, + const char *propname, u8 *out_values, size_t sz) +{ + of_property_read_array(np, propname, out_values, sz); +} +EXPORT_SYMBOL_GPL(of_property_read_u8_array); + +/** + * of_property_read_u16_array - Find and read an array of u16 from a property. + * + * @np:device node from which the property value is to be read. + * @propname: name of the property to be searched. + * @out_value: pointer to return value, modified only if return value is 0. + * + * Search for a property in a device node and read 16-bit value(s) from + * it. Returns 0 on success, -EINVAL if the property does not exist, + * -ENODATA if property does not have a value, and -EOVERFLOW if the + * property data isn't large enough. + * + * The out_value is modified only if a valid u16 value can be decoded. + */ +int of_property_read_u16_array(const struct device_node *np, + const char *propname, u16 *out_values, size_t sz) +{ + of_property_read_array(np, propname, out_values, sz); +} +EXPORT_SYMBOL_GPL(of_property_read_u16_array); + /** * of_property_read_u32_array - Find and read an array of 32 bit integers * from a property. @@ -689,20 +747,7 @@ int of_property_read_u32_array(const struct device_node *np, const char *propname, u32 *out_values, size_t sz) { - struct property *prop = of_find_property(np, propname, NULL); - const __be32 *val; - - if (!prop) - return -EINVAL; - if (!prop->value) - return -ENODATA; - if ((sz * sizeof(*out_values)) > prop->length) - return -EOVERFLOW; - - val = prop->value; - while (sz--) - *out_values++ = be32_to_cpup(val++); - return 0; + of_property_read_array(np, propname, out_values, sz); } EXPORT_SYMBOL_GPL(of_property_read_u32_array); diff --git a/include/linux/of.h b/include/linux/of.h index 72843b7..e2d9b40 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -223,6 +223,10 @@ extern struct device_node *of_find_node_with_property( extern struct property *of_find_property(const struct device_node *np, const char *name, int *lenp); +extern int of_property_read_u8_array(const struct
[PATCH] net: usb: Fix memory leak on Tx data path
Driver anchors the tx urbs and defers the urb submission if a transmit request comes when the interface is suspended. Anchoring urb increments the urb reference count. These deferred urbs are later accessed by calling usb_get_from_anchor() for submission during interface resume. usb_get_from_anchor() unanchors the urb but urb reference count remains same. This causes the urb reference count to remain non-zero after usb_free_urb() gets called and urb never gets freed. Hence call usb_put_urb() after anchoring the urb to properly balance the reference count for these deferred urbs. Also, unanchor these deferred urbs during disconnect, to free them up. Signed-off-by: Hemant Kumar --- drivers/net/usb/usbnet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 1867fe2..00b7598 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1168,6 +1168,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb, usb_anchor_urb(urb, >deferred); /* no use to process more packets */ netif_stop_queue(net); + usb_put_urb(urb); spin_unlock_irqrestore(>txq.lock, flags); netdev_dbg(dev->net, "Delaying transmission for resumption\n"); goto deferred; @@ -1317,6 +1318,8 @@ void usbnet_disconnect (struct usb_interface *intf) cancel_work_sync(>kevent); + usb_scuttle_anchored_urbs(>deferred); + if (dev->driver_info->unbind) dev->driver_info->unbind (dev, intf); -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dt: add helper function to read u8 & u16 variables & arrays
On 25 October 2012 19:48, Viresh Kumar wrote: > The problem i see here is: > > The data passed via DT comes as Little Endian in the kernel. > > For a little endian system, byte zero will contain the data and so > (u8) val > > look to be the correct thing. > > For a big endian system, byte 3 will contain data as it is swapped by > be32_to_cpup. > So (u8) val would return value stored by byte 0 instead. ?? I feel above explanation was wrong. I didn't had a big endian system to test this but this is what i derived theoretically. Consider following sequence of commands: u32 x = 0x01; //This will store off-0: 1, off-3:0 for LE system // and will store off-0: 0, off-3:1 for BE system u8 y = (u8) x; // For any architecture type, i.e. big or little this must store // 1 in y. This is ANCI C semantic and should be architecture // independent. Which would mean, type cast will give off-0 on LE and off-3 on BE systems. Don't confuse this with getting values using pointers, as we try to get data out of specific locations in those cases. So, my initial code seems to be doing the right thing. be32_to_cpup() will move the LSB to off-3 and that is what we will get during the cast. I will resend my original code to you. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP
HATAYAMA Daisuke writes: > From: "H. Peter Anvin" > Subject: Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP > Date: Mon, 22 Oct 2012 17:35:47 -0700 > >> On 10/22/2012 02:29 PM, Eric W. Biederman wrote: As I said, I thought Fenghua tried that but it didn't work, experimentally. >>> >>> Fair enough. You described the problem with clearing bit 8 in a weird >>> way. >>> >>> If the best we can muster are fuzzy memories it may be worth >>> revisiting. >>> Perhaps it works on enough cpu models to be interesting. >>> >> >> It isn't fuzzy memories... this was done as late as 1-2 months ago. I >> just don't know the details. >> >> Fenghua, could you help fill us in? >> > > I overlooked completely the fact that BSP flag is rewritable. > > I tried Eric's suggestion using attached test programs and saw it > worked fine at least on the three cpus around me below: > > - Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz > - Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz > - Intel(R) Xeon(TM) CPU 1.80GHz > - 32 bits CPU > > Next I found the description about this in 8.4.2, IASDM Vol.3: > > The MP initialization protocol imposes the following requirements > and restrictions on the system: > > * The MP protocol is executed only after a power-up or RESET. If the > MP protocol has completed and a BSP is chosen, subsequent INITs > (either to a specific processor or system wide) do not cause the > MP protocol to be repeated. Instead, each logical processor > examines its BSP flag (in the IA32_APIC_BASE MSR) to determine > whether it should execute the BIOS boot-strap code (if it is the > BSP) or enter a wait-for-SIPI state (if it is an AP). > > So this is no longer undocumented behaviour for recent cpus, I think. The underdocumented bit is the ability to clear the flag. And of course these are processor specific registers. > Considering these, I'll make a patch to clear BSP flag at appropreate > position in kernel boot-up code. OTOH, according to the discussion, it > was reported that clearing BSP flag affected some BIOSes. To deal with > this, I'll prepare a kernel option to decide whether to clear BSP flag > or not. > > Does anyone have any comments now? Or please comment after I submit a > new patch. I think you are on right track with preparing some patches, and this certainly looks like worth experimenting with. At least for i386 the code need to verify you have a cpu new enough to have an APIC_BASE_MSR, but I don't think that is going to be hard. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/booting.txt
On 10/24/2012 11:11:36 AM, Catalin Marinas wrote: > 2012/10/24 Tekkaman Ninja : > > This is a Chinese translated version of > > Documentation/arm64/booting.txt > > > > Signed-off-by: Fu Wei > > Same as my comment on the memory.txt patch, I don't understand what > it > says, so you need to keep it up to date. > > Acked-by: Catalin Marinas I have a similar problem: I can't read translations to languages I don't speak. I argued for putting them on the web way back when, but Greg Kroah-Hartman incorporated stuff he can't read either into the Documentation directory. Last I pinged him he was willing to maintain non-english translations, try sending them to him? Rob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 11:38:11AM +0800, YingHang Zhu wrote: > On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner wrote: > > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: > >> Hi Chen, > >> > >> > But how can bdi related ra_pages reflect different files' readahead > >> > window? Maybe these different files are sequential read, random read > >> > and so on. > >> > >> It's simple: sequential reads will get ra_pages readahead size while > >> random reads will not get readahead at all. > >> > >> Talking about the below chunk, it might hurt someone that explicitly > >> takes advantage of the behavior, however the ra_pages*2 seems more > >> like a hack than general solution to me: if the user will need > >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for > >> improving IO performance, then why not just increase bdi->ra_pages and > >> benefit all reads? One may argue that it offers some differential > >> behavior to specific applications, however it may also present as a > >> counter-optimization: if the root already tuned bdi->ra_pages to the > >> optimal size, the doubled readahead size will only cost more memory > >> and perhaps IO latency. > >> > >> --- a/mm/fadvise.c > >> +++ b/mm/fadvise.c > >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, > >> loff_t len, int advice) > >> spin_unlock(>f_lock); > >> break; > >> case POSIX_FADV_SEQUENTIAL: > >> - file->f_ra.ra_pages = bdi->ra_pages * 2; > > > > I think we really have to reset file->f_ra.ra_pages here as it is > > not a set-and-forget value. e.g. shrink_readahead_size_eio() can > > reduce ra_pages as a result of IO errors. Hence if you have had io > > errors, telling the kernel that you are now going to do sequential > > IO should reset the readahead to the maximum ra_pages value > > supported > If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone > device's readahead can be directly tuned or turned off with blockdev > thus affect all files > using the device and without bring more complexity... It's not really feasible/convenient for the end users to hand tune blockdev readahead size on IO errors. Even many administrators are totally unaware of the readahead size parameter. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()
On 10/25/2012 10:56 PM, Linus Torvalds wrote: Guess what? If you want to optimize the function to not do remote TLB flushes, then just do that! None of the garbage. Just change the flush_tlb_page(vma, address); line to __flush_tlb_one(address); That may not even be needed. Apparently Intel chips automatically flush an entry from the TLB when it causes a page fault. I assume AMD chips do the same, because flush_tlb_fix_spurious_fault evaluates to nothing on x86. and it should damn well work. Because everything I see about "flush_remote" looks just wrong, wrong, wrong. Are there architectures where we do need to flush remote TLBs on upgrading the permissions on a PTE? Because that is what the implementation in pgtable-generic.c seems to be doing as well... And if there really is some reason for that whole flush_remote braindamage, then we have much bigger problems, namely the fact that we've broken the documented semantics of that function, and we're doing various other things that are completely and utterly invalid unless the above semantics hold. Want to just remove the TLB flush entirely and see if anything breaks in 3.8-rc1? From reading the code again, it looks like things should indeed work ok. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path
On Thu, Oct 25, 2012 at 08:38:25PM -0700, Linus Torvalds wrote: > It's valid to cast a non-const pointer to a const one. It's the > *other* way around that is invalid. > > So marking fw_path[] as having 'const char *' elements just means that > we won't be changing those elements through the fw_path[] array > (correct: we only read them). The fact that one of those same pointers > is then also available through a non-const pointer variable means that > they can change through *that* pointer, but that doesn't change the > fact that fw_path[] itself contains const pointers. > > Remember: in C, a "const pointer" does *not* mean that the thing it > points to cannot change. It only means that it cannot change through > *that* pointer. It's a bit trickier, unfortunately - pointer to pointer to const char and pointer to pointer to char do not mix. Just for fun, try to constify envp and argv arguments of call_usermodehelper()... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On 10/26/2012 11:28 AM, YingHang Zhu wrote: On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen wrote: On 10/26/2012 09:27 AM, Fengguang Wu wrote: On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote: On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: Hi Chen, But how can bdi related ra_pages reflect different files' readahead window? Maybe these different files are sequential read, random read and so on. It's simple: sequential reads will get ra_pages readahead size while random reads will not get readahead at all. Talking about the below chunk, it might hurt someone that explicitly takes advantage of the behavior, however the ra_pages*2 seems more like a hack than general solution to me: if the user will need POSIX_FADV_SEQUENTIAL to double the max readahead window size for improving IO performance, then why not just increase bdi->ra_pages and benefit all reads? One may argue that it offers some differential behavior to specific applications, however it may also present as a counter-optimization: if the root already tuned bdi->ra_pages to the optimal size, the doubled readahead size will only cost more memory and perhaps IO latency. --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) spin_unlock(>f_lock); break; case POSIX_FADV_SEQUENTIAL: - file->f_ra.ra_pages = bdi->ra_pages * 2; I think we really have to reset file->f_ra.ra_pages here as it is not a set-and-forget value. e.g. shrink_readahead_size_eio() can reduce ra_pages as a result of IO errors. Hence if you have had io errors, telling the kernel that you are now going to do sequential IO should reset the readahead to the maximum ra_pages value supported Good point! but wait this patch removes file->f_ra.ra_pages in all other places too, so there will be no file->f_ra.ra_pages to be reset here... In his patch, static void shrink_readahead_size_eio(struct file *filp, struct file_ra_state *ra) { - ra->ra_pages /= 4; + spin_lock(>f_lock); + filp->f_mode |= FMODE_RANDOM; + spin_unlock(>f_lock); As the example in comment above this function, the read maybe still sequential, and it will waste IO bandwith if modify to FMODE_RANDOM directly. I've considered about this. On the first try I modified file_ra_state.size and file_ra_state.async_size directly, like file_ra_state.async_size = 0; file_ra_state.size /= 4; but as what I comment here, we can not predict whether the bad sectors will trash the readahead window, maybe the following sectors after current one are ok to go in normal readahead, it's hard to know, the current approach gives us a chance to slow down softly. Then when will check filp->f_mode |= FMODE_RANDOM; ? Does it will influence ra->ra_pages? Thanks, Ying Zhu Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path
On Thu, Oct 25, 2012 at 8:12 PM, Ming Lei wrote: > > Yes, it should be the cleanest, I don't do it because I thought that might > have caused one compile warning('const char *' points to memory > without 'const', like below) You can just keep the const. In fact, you could even add one, and make it be static const char * const fw_path[] = { We currently don't mark fw_path[] itself const (even though it is), only the strings it points to. > but in fact there isn't any warning with above change and it does work, still > don't know why? :-( It's valid to cast a non-const pointer to a const one. It's the *other* way around that is invalid. So marking fw_path[] as having 'const char *' elements just means that we won't be changing those elements through the fw_path[] array (correct: we only read them). The fact that one of those same pointers is then also available through a non-const pointer variable means that they can change through *that* pointer, but that doesn't change the fact that fw_path[] itself contains const pointers. Remember: in C, a "const pointer" does *not* mean that the thing it points to cannot change. It only means that it cannot change through *that* pointer. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 8:25 AM, Dave Chinner wrote: > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: >> Hi Chen, >> >> > But how can bdi related ra_pages reflect different files' readahead >> > window? Maybe these different files are sequential read, random read >> > and so on. >> >> It's simple: sequential reads will get ra_pages readahead size while >> random reads will not get readahead at all. >> >> Talking about the below chunk, it might hurt someone that explicitly >> takes advantage of the behavior, however the ra_pages*2 seems more >> like a hack than general solution to me: if the user will need >> POSIX_FADV_SEQUENTIAL to double the max readahead window size for >> improving IO performance, then why not just increase bdi->ra_pages and >> benefit all reads? One may argue that it offers some differential >> behavior to specific applications, however it may also present as a >> counter-optimization: if the root already tuned bdi->ra_pages to the >> optimal size, the doubled readahead size will only cost more memory >> and perhaps IO latency. >> >> --- a/mm/fadvise.c >> +++ b/mm/fadvise.c >> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t >> len, int advice) >> spin_unlock(>f_lock); >> break; >> case POSIX_FADV_SEQUENTIAL: >> - file->f_ra.ra_pages = bdi->ra_pages * 2; > > I think we really have to reset file->f_ra.ra_pages here as it is > not a set-and-forget value. e.g. shrink_readahead_size_eio() can > reduce ra_pages as a result of IO errors. Hence if you have had io > errors, telling the kernel that you are now going to do sequential > IO should reset the readahead to the maximum ra_pages value > supported If we unify file->f_ra.ra_pages and its' bdi->ra_pages, then the error-prone device's readahead can be directly tuned or turned off with blockdev thus affect all files using the device and without bring more complexity... Thanks, Ying Zhu > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 02/16 v2] f2fs: add on-disk layout
[snip] > > +#define F2FS_SUPER_MAGIC 0xF2F52010 > > +#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */ > > Does f2fs superblock really haven't any offset from the volume begin? The reason that I changed this from 1 to 0 is due to the failure during android recovery. I don't know why the recovery is failed when the offset is 1, but it works fine after the offset is changed to 0. I suspect that mount procedure inspects the 0'th offset to figure out what file system is installed by checking some kind of magic numbers. Sometimes, we've seen that the mount program tries to load previously installed file system even though mkfs.f2fs was conducted. Would you recommend something? > > > +#define F2FS_BLKSIZE 4096 > > +#define F2FS_MAX_EXTENSION 64 > > + > > +#define NULL_ADDR 0x0U > > +#define NEW_ADDR -1U > > Does NULL_ADDR and NEW_ADDR declarations really need? Does kernel > haven't any analogous? These are used for F2FS-specific block allocation, so for readability, I don't want to change this. > > > + > > +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num) > > +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num) > > +#define F2FS_META_INO(sbi) (sbi->meta_ino_num) > > + > > +#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO) > > + > > +#define MAX_ACTIVE_LOGS16 > > +#define MAX_ACTIVE_NODE_LOGS 8 > > +#define MAX_ACTIVE_DATA_LOGS 8 > > I think that it makes sense to comment the reasons of such limitations > in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS. The maximum number of logs is suggested by arnd before. As I understood, why he suggested such a quite large number is for further optimization of multiple logs without any on-disk layout changes. And, I think it is quite enough. > > > + > > +/* > > + * For superblock > > + */ > > +struct f2fs_super_block { > > + __le32 magic; /* Magic Number */ > > + __le16 major_ver; /* Major Version */ > > + __le16 minor_ver; /* Minor Version */ > > + __le32 log_sectorsize; /* log2 (Sector size in bytes) */ > > + __le32 log_sectors_per_block; /* log2 (Number of sectors per block */ > > + __le32 log_blocksize; /* log2 (Block size in bytes) */ > > + __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */ > > From my point of view, __le32 is big data type for log2 (). What > do you think? > Right, but it is superblock. Should we have to consider space overhead? > > + __le32 segs_per_sec;/* Number of segments per section */ > > + __le32 secs_per_zone; /* Number of sections per zone */ > > + __le32 checksum_offset; /* Checksum position in this super block */ > > + __le64 block_count; /* Total number of blocks */ > > + __le32 section_count; /* Total number of sections */ > > + __le32 segment_count; /* Total number of segments */ > > + __le32 segment_count_ckpt; /* Total number of segments > > + in Checkpoint area */ > > + __le32 segment_count_sit; /* Total number of segments > > +in Segment information table */ > > + __le32 segment_count_nat; /* Total number of segments > > +in Node address table */ > > + /*Total number of segments in Segment summary area */ > > + __le32 segment_count_ssa; > > + /* Total number of segments in Main area */ > > + __le32 segment_count_main; > > + __le32 failure_safe_block_distance; > > + __le32 segment0_blkaddr;/* Start block address of Segment 0 */ > > + __le32 start_segment_checkpoint; /* Start block address of ckpt */ > > + __le32 sit_blkaddr; /* Start block address of SIT */ > > + __le32 nat_blkaddr; /* Start block address of NAT */ > > + __le32 ssa_blkaddr; /* Start block address of SSA */ > > + __le32 main_blkaddr;/* Start block address of Main area */ > > + __le32 root_ino;/* Root directory inode number */ > > + __le32 node_ino;/* node inode number */ > > + __le32 meta_ino;/* meta inode number */ > > + __le32 volume_serial_number;/* VSN is optional field */ > > Usually, it is used 128-bits UUID for serial number. Why do you use > __le32 as volume_serial_number? Ok, I'll change. [snip] > > +/* > > + * For directory operations > > + */ > > +#define F2FS_DOT_HASH 0 > > +#define F2FS_DDOT_HASH F2FS_DOT_HASH > > +#define F2FS_MAX_HASH (~((0x3ULL) << 62)) > > +#define F2FS_HASH_COL_BIT ((0x1ULL) << 63) > > + > > +typedef __le32 f2fs_hash_t; > > + > > +#define F2FS_NAME_LEN 8 > > It exists F2FS_MAX_NAME_LEN. I think that it makes sense to comment here > purpose of F2FS_NAME_LEN declaration. Ok, thanks. --- Jaegeuk Kim Samsung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kdump with signed images
Matthew Garrett writes: > On Thu, Oct 25, 2012 at 09:15:58PM -0400, Mimi Zohar wrote: > >> On a running system, the package installer, after verifying the package >> integrity, would install each file with the associated 'security.ima' >> extended attribute. The 'security.evm' digital signature would be >> installed with an HMAC, calculated using a system unique key. > > The idea isn't to prevent /sbin/kexec from being modified after > installation - it's to prevent it from being possible to install a > system that has a modified /sbin/kexec. Leaving any part of this up to > the package installer means that it doesn't solve the problem we're > trying to solve here. It must be impossible for the kernel to launch any > /sbin/kexec that hasn't been signed by a trusted key that's been built > into the kernel, and it must be impossible for anything other than > /sbin/kexec to make the kexec system call. The 'security.capability' attribute modulo weirdness with the security bounding set gives us the necessary tools to allow /sbin/kexec to make the system call. The primary trick with this is to limit the installer in such as way that we can trust the installer even on a system on which root has been compromised. Trusting the installer is the same class of problem as trusting /sbin/kexec, and to me a much more interesting problem as it keeps critical system files from being tampered with. It sounds like there are some tricky details to work through but this direction of system integrity looks like it is worth pursuing, regardless of how we handle a signed /sbin/kexec. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] cpufreq: governors: remove redundant code
On 26 October 2012 05:43, Rafael J. Wysocki wrote: > I have applied this patch only because of the fixes on top of it. It broke > kernel compliation due to some missing EXPORT_SYMBOL_GPLs in > cpufreq_governor.c, > so I woulnd't have applied it otherwise. Hi Rafael, So sorry for this. I am really feeling bad for that. I should have tried compiling them as modules too. I had that in mind while coding it, but forgot it later. Thanks for making changes on my behalf. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 10:30 AM, Ni zhan Chen wrote: > On 10/26/2012 09:27 AM, Fengguang Wu wrote: >> >> On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote: >>> >>> On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: Hi Chen, > But how can bdi related ra_pages reflect different files' readahead > window? Maybe these different files are sequential read, random read > and so on. It's simple: sequential reads will get ra_pages readahead size while random reads will not get readahead at all. Talking about the below chunk, it might hurt someone that explicitly takes advantage of the behavior, however the ra_pages*2 seems more like a hack than general solution to me: if the user will need POSIX_FADV_SEQUENTIAL to double the max readahead window size for improving IO performance, then why not just increase bdi->ra_pages and benefit all reads? One may argue that it offers some differential behavior to specific applications, however it may also present as a counter-optimization: if the root already tuned bdi->ra_pages to the optimal size, the doubled readahead size will only cost more memory and perhaps IO latency. --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) spin_unlock(>f_lock); break; case POSIX_FADV_SEQUENTIAL: - file->f_ra.ra_pages = bdi->ra_pages * 2; >>> >>> I think we really have to reset file->f_ra.ra_pages here as it is >>> not a set-and-forget value. e.g. shrink_readahead_size_eio() can >>> reduce ra_pages as a result of IO errors. Hence if you have had io >>> errors, telling the kernel that you are now going to do sequential >>> IO should reset the readahead to the maximum ra_pages value >>> supported >> >> Good point! >> >> but wait this patch removes file->f_ra.ra_pages in all other >> places too, so there will be no file->f_ra.ra_pages to be reset here... > > > In his patch, > > > static void shrink_readahead_size_eio(struct file *filp, > struct file_ra_state *ra) > { > - ra->ra_pages /= 4; > + spin_lock(>f_lock); > + filp->f_mode |= FMODE_RANDOM; > + spin_unlock(>f_lock); > > As the example in comment above this function, the read maybe still > sequential, and it will waste IO bandwith if modify to FMODE_RANDOM > directly. I've considered about this. On the first try I modified file_ra_state.size and file_ra_state.async_size directly, like file_ra_state.async_size = 0; file_ra_state.size /= 4; but as what I comment here, we can not predict whether the bad sectors will trash the readahead window, maybe the following sectors after current one are ok to go in normal readahead, it's hard to know, the current approach gives us a chance to slow down softly. Thanks, Ying Zhu > >> >> Thanks, >> Fengguang >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arm: l2cc: doc: fix device tree example typo
On 10/23/2012 07:53 PM, Josh Cartwright wrote: > The list of attributes above details the use of the 'filter-ranges' > property, but the example improperly used 'filter-latency'. Make these > consistent by fixing up the example. > > Signed-off-by: Josh Cartwright Applied for 3.8 (unless I get more to send for 3.7). Rob > --- > Documentation/devicetree/bindings/arm/l2cc.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt > b/Documentation/devicetree/bindings/arm/l2cc.txt > index 7ca5216..7c3ee3a 100644 > --- a/Documentation/devicetree/bindings/arm/l2cc.txt > +++ b/Documentation/devicetree/bindings/arm/l2cc.txt > @@ -37,7 +37,7 @@ L2: cache-controller { > reg = <0xfff12000 0x1000>; > arm,data-latency = <1 1 1>; > arm,tag-latency = <2 2 2>; > -arm,filter-latency = <0x8000 0x800>; > +arm,filter-ranges = <0x8000 0x800>; > cache-unified; > cache-level = <2>; > interrupts = <45>; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the arm-soc tree with the pinctrl tree
Hi all, Today's linux-next merge of the arm-soc tree got a conflict in arch/arm/mach-ux500/cpu-db8500.c between commit 63c53906312f ("pinctrl/nomadik: move the platform data header") from the pinctrl tree and commit 4040d10a3d44 ("ARM: ux500: add DB serial number to entropy pool") from the arm-soc tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/mach-ux500/cpu-db8500.c index 87a8f9f,50202a1..000 --- a/arch/arm/mach-ux500/cpu-db8500.c +++ b/arch/arm/mach-ux500/cpu-db8500.c @@@ -18,7 -18,7 +18,8 @@@ #include #include #include +#include + #include #include #include pgpFxzCCbQqm5.pgp Description: PGP signature
Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP
From: "H. Peter Anvin" Subject: Re: [PATCH v1 2/2] x86, apic: Disable BSP if boot cpu is AP Date: Mon, 22 Oct 2012 17:35:47 -0700 > On 10/22/2012 02:29 PM, Eric W. Biederman wrote: >>> >>> As I said, I thought Fenghua tried that but it didn't work, >>> experimentally. >> >> Fair enough. You described the problem with clearing bit 8 in a weird >> way. >> >> If the best we can muster are fuzzy memories it may be worth >> revisiting. >> Perhaps it works on enough cpu models to be interesting. >> > > It isn't fuzzy memories... this was done as late as 1-2 months ago. I > just don't know the details. > > Fenghua, could you help fill us in? > I overlooked completely the fact that BSP flag is rewritable. I tried Eric's suggestion using attached test programs and saw it worked fine at least on the three cpus around me below: - Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz - Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz - Intel(R) Xeon(TM) CPU 1.80GHz - 32 bits CPU Next I found the description about this in 8.4.2, IASDM Vol.3: The MP initialization protocol imposes the following requirements and restrictions on the system: * The MP protocol is executed only after a power-up or RESET. If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated. Instead, each logical processor examines its BSP flag (in the IA32_APIC_BASE MSR) to determine whether it should execute the BIOS boot-strap code (if it is the BSP) or enter a wait-for-SIPI state (if it is an AP). So this is no longer undocumented behaviour for recent cpus, I think. Considering these, I'll make a patch to clear BSP flag at appropreate position in kernel boot-up code. OTOH, according to the discussion, it was reported that clearing BSP flag affected some BIOSes. To deal with this, I'll prepare a kernel option to decide whether to clear BSP flag or not. Does anyone have any comments now? Or please comment after I submit a new patch. Thanks. HATAYAMA, Daisuke bsp_flag_modules.tar.bz2 Description: Binary data
linux-next: manual merge of the arm-soc tree with the gpio-lw tree
Hi all, Today's linux-next merge of the arm-soc tree got a conflict in arch/arm/Kconfig between commit a3b8d4a51357 ("GPIO: Add support for GPIO on CLPS711X-target platform") from the gpio-lw tree and commit 4a8355c4c34f ("ARM: clps711x: convert to clockevents") from the arm-soc tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/Kconfig index a7c541e,d9b7a84..000 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@@ -366,8 -364,6 +366,7 @@@ config ARCH_CNS3XX config ARCH_CLPS711X bool "Cirrus Logic CLPS711x/EP721x/EP731x-based" + select ARCH_REQUIRE_GPIOLIB - select ARCH_USES_GETTIMEOFFSET select CLKDEV_LOOKUP select COMMON_CLK select CPU_ARM720T pgpanTYNefaY1.pgp Description: PGP signature
Re: The idea about scheduler test module(STM)
On 10/26/2012 10:27 AM, Charles Wang wrote: > Yes, it's a new way to do scheduler test. But why use kernel > threads? The info, total time, run time, wait time, preempt number, all > can be collected from tasks' sched info from /proc/pid/sched and > /proc/pid/stat. That's right, next version's STM could get info like proc, actually there are many ways to implement it, but we need to figure out whether STM is needed firstly ;-) > > I don't understand clearly about "pure scheduler performance" here. In > order to test scheduler fully, we need to do IO test, and other > subsystems will be involved in. But if other subsystems' code are not > changed, the result still can refer to scheduler's change. For example when we test the latency of net traffic, there are software, nic driver, nic, router, and the same thing on peer machine, all those staff besides scheduler are unstable even the code not changed, and can easily influence the test result. Let's say we changed some param like 'sched_latency_ns' which we think may help to reduce the latency of system, and suppose it really helped to reduce 0.xx% latency, how can we detect that while so many unstable thing in our test? Then after we have done many changes, suppose each of them can help increase 0.xx% but we can't know, so we just think those param are useless, but actually the accumulate improvement may be x% which we really care. So the pure means with out any influence from other thing. But I think STM should also have the ability to do the normal test which included all the subsystem we want, and then we could know which one is the real bottleneck. > > I can't think much farther about the advantage this way could give. > Maybe you should show us ur better examples. :) I'd like to make it more useful not just a demo, but I need more feedback and suggestions :) Regards, Michael Wang > > Regards, > Charles > > On 10/25/2012 01:40 PM, Michael Wang wrote: >> Hi, Folks >> >> Charles has raised a problem that we don't have any tool yet >> for testing the scheduler with out any disturb from other >> subsystem, and I also found it's hard to test scheduler optimize >> patch, since the improvement could be easily eaten by other >> subsystem like IO. >> >> So Let's check the tools we have currently: >> 1. perf sched >> >> we can use it to trace the threads we interested, and >> the info it provided is very good, but one issue is, >> it could not create the workload we want, also collect >> the info and do summary is not so easy. >> >> 2. linsched >> >> It's a very good tool to create the test environment, >> but it's implementation is to ideal, so it could not >> present the real world problem. >> >> Since both perf and linsched could not meet our requirement, we >> decided to develop a new tool, let's currently call it >> scheduler test module(STM). >> >> It's propose is: >> 1. create the workload we want. >> 2. test the pure scheduler. >> 3. collect info we need and do summary. >> >> This tool should be very easy to use and not depends on >> the implementation of scheduler. >> >> We can use it to check the pure scheduler performance on >> our system. >> >> We can use it to check whether there are regression in >> scheduler when testing patches. >> >> And other usage I've not figure out yet. >> >> In order to explain the idea more directly, I have wrote a >> prototype STM, it's a separate module, and you can use it >> just like 'rcutorture'. >> >> I attached a small script 'play.sh' to help you easily >> run the test, put 'schedtm.c' and 'play.sh' in same directory >> and run 'play.sh', you will see out put like: >> >> schedtm: summary >> schedtm: cpu count:cpurunpreempt >> schedtm: 013811381 >> schedtm: 1957955 >> schedtm: 2900900 >> schedtm: 310351034 >> schedtm: 4991990 >> schedtm: 5940939 >> schedtm: 6900897 >> schedtm: 7942948 >> schedtm: 8852850 >> schedtm: 9931938 >> schedtm: 10936934 >> schedtm: 11951950 >> schedtm: total time(us):10138172 >> schedtm: run time(us):5055223(49.86%) >> schedtm: wait time(us):5082949 >> schedtm: latency(us):10489 >> schedtm: stmt22 got highest run time 5604941(+10%) >> schedtm: stmt3 got lowest run time 4852057(-4%) >> schedtm: stmt12 got highest latency 11482(+9%) >> schedtm: stmt0 got lowest latency 7561(-27%) >> >> And you can enable/disable CONFIG_PREEMPT to see magnificent >> change on latency. >> >> This is nothing but a demo, and please "RUN IT ON A TEST MACHINE"... >> >> It will create 24 kernel threads and run 10 seconds, you can change >> it by module param. >> >> I will be appreciate if I could get some feedback from the scheduler >> experts like you, whatever you think it's good or junk,
[PATCH] x86, doc: fix grammar and typo in boot.txt
Fixes some minor issues in the x86 boot documentation. Signed-off-by: Kees Cook --- Documentation/x86/boot.txt |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index 9efceff..f15cb74 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt @@ -1013,7 +1013,7 @@ boot_params as that of 16-bit boot protocol, the boot loader should also fill the additional fields of the struct boot_params as that described in zero-page.txt. -After setupping the struct boot_params, the boot loader can load the +After setting up the struct boot_params, the boot loader can load the 32/64-bit kernel in the same way as that of 16-bit boot protocol. In 32-bit boot protocol, the kernel is started by jumping to the @@ -1023,7 +1023,7 @@ In 32-bit boot protocol, the kernel is started by jumping to the At entry, the CPU must be in 32-bit protected mode with paging disabled; a GDT must be loaded with the descriptors for selectors __BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat -segment; __BOOS_CS must have execute/read permission, and __BOOT_DS +segment; __BOOT_CS must have execute/read permission, and __BOOT_DS must have read/write permission; CS must be __BOOT_CS and DS, ES, SS must be __BOOT_DS; interrupt must be disabled; %esi must hold the base address of the struct boot_params; %ebp, %edi and %ebx must be zero. -- 1.7.9.5 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Remove CsrEventHandle and functions using it
Thank you for comment. I will make and send patch again. Thanks and Regards. SeongJae Park. On Fri, Oct 26, 2012 at 4:16 AM, Greg KH wrote: > On Thu, Oct 25, 2012 at 02:51:10PM +0900, SeongJae Park wrote: >> Nobody use CsrEventHandle, Nobody call function using it as parameter. >> So, remove it. >> >> Signed-off-by: SeongJae Park >> --- >> drivers/staging/csr/csr_framework_ext.h | 61 >> - >> drivers/staging/csr/csr_framework_ext_types.h |2 - >> 2 files changed, 63 deletions(-) > > This patch applies with fuzz, which implies that you didn't make it > against the latest tree. Care to redo it, and the 2/2 patch, against > either my staging-next branch of the staging.git tree, or the next > linux-next release that will happen tomorrow, and resend them? > > thanks, > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path
On Fri, Oct 26, 2012 at 10:32 AM, Linus Torvalds wrote: > > Please just make "fw_path[0]" just be the pointer to fw_path_para[] > (which sounds like the cleanest fix) and get rid of the negative 'i' > and conditional entirely. Yes, it should be the cleanest, I don't do it because I thought that might have caused one compile warning('const char *' points to memory without 'const', like below) static char fw_path_para[256]; static const char *fw_path[] = { fw_path_para, "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" }; but in fact there isn't any warning with above change and it does work, still don't know why? :-( Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.
On 10/26/2012 03:51 AM, Johannes Weiner wrote: On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote: On x86 memory accesses to pages without the ACCESSED flag set result in the ACCESSED flag being set automatically. With the ARM architecture a page access fault is raised instead (and it will continue to be raised until the ACCESSED flag is set for the appropriate PTE/PMD). For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only be called for a write fault. This patch ensures that faults on transparent hugepages which do not result in a CoW update the access flags for the faulting pmd. Cc: Chris Metcalf Cc: Kirill A. Shutemov Cc: Andrea Arcangeli Signed-off-by: Will Deacon Acked-by: Johannes Weiner Ok chaps, I rebased this thing onto today's next (which basically necessitated a rewrite) so I've reluctantly dropped my acks and kindly ask if you could eyeball the new code, especially where the locking is concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again that the page is not splitting, but I can't see why that is required. I don't either. If the thing was splitting when the fault happened, that path is not taken. And the locked pmd_same() check should rule out splitting setting in after testing pmd_trans_huge_splitting(). Why I can't find function pmd_trans_huge_splitting() you mentioned in latest mainline codes and linux-next? Peter? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 4/6] Thermal: Remove the cooling_cpufreq_list
On 26 October 2012 03:14, Francesco Lavra wrote: > Hi, > Hongbo Zhang wrote: >> Problem of using this list is that the cpufreq_get_max_state callback will be >> called when register cooling device by thermal_cooling_device_register, but >> this list isn't ready at this moment. What's more, there is no need to >> maintain >> such a list, we can get cpufreq_cooling_device instance by the private >> thermal_cooling_device.devdata. >> >> Signed-off-by: hongbo.zhang >> --- >> drivers/thermal/cpu_cooling.c | 81 >> +-- >> 1 file changed, 16 insertions(+), 65 deletions(-) >> >> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c >> index 415b041..cc80d29 100644 >> --- a/drivers/thermal/cpu_cooling.c >> +++ b/drivers/thermal/cpu_cooling.c >> @@ -58,8 +58,9 @@ struct cpufreq_cooling_device { >> }; >> static LIST_HEAD(cooling_cpufreq_list); >> static DEFINE_IDR(cpufreq_idr); >> +static DEFINE_MUTEX(cooling_cpufreq_lock); >> >> -static struct mutex cooling_cpufreq_lock; >> +static unsigned int cpufreq_dev_count; >> >> /* notify_table passes value to the CPUFREQ_ADJUST callback function. */ >> #define NOTIFY_INVALID NULL >> @@ -241,20 +242,12 @@ static int cpufreq_get_max_state(struct >> thermal_cooling_device *cdev, >>unsigned long *state) >> { >> int ret = -EINVAL, i = 0; >> - struct cpufreq_cooling_device *cpufreq_device; >> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata; >> struct cpumask *maskPtr; >> unsigned int cpu; >> struct cpufreq_frequency_table *table; >> unsigned long count = 0; >> >> - mutex_lock(_cpufreq_lock); >> - list_for_each_entry(cpufreq_device, _cpufreq_list, node) { >> - if (cpufreq_device && cpufreq_device->cool_dev == cdev) >> - break; >> - } >> - if (cpufreq_device == NULL) >> - goto return_get_max_state; >> - >> maskPtr = _device->allowed_cpus; >> cpu = cpumask_any(maskPtr); >> table = cpufreq_frequency_get_table(cpu); >> @@ -276,7 +269,6 @@ static int cpufreq_get_max_state(struct >> thermal_cooling_device *cdev, >> } >> >> return_get_max_state: >> - mutex_unlock(_cpufreq_lock); >> return ret; > > Since there is no mutex locking/unlocking anymore, I'd say the goto > label should be removed. Good. > > [...] >> void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) >> { >> - struct cpufreq_cooling_device *cpufreq_dev = NULL; >> - unsigned int cpufreq_dev_count = 0; >> + struct cpufreq_cooling_device *cpufreq_dev = cdev->devdata; >> >> - mutex_lock(_cpufreq_lock); >> - list_for_each_entry(cpufreq_dev, _cpufreq_list, node) { >> - if (cpufreq_dev && cpufreq_dev->cool_dev == cdev) >> - break; >> - cpufreq_dev_count++; >> - } >> - >> - if (!cpufreq_dev || cpufreq_dev->cool_dev != cdev) { >> - mutex_unlock(_cpufreq_lock); >> - return; >> - } >> + thermal_cooling_device_unregister(cpufreq_dev->cool_dev); >> >> - list_del(_dev->node); >> + mutex_lock(_cpufreq_lock); >> + cpufreq_dev_count--; >> >> /* Unregister the notifier for the last cpufreq cooling device */ >> - if (cpufreq_dev_count == 1) { >> + if (cpufreq_dev_count == 0) { >> cpufreq_unregister_notifier(_cpufreq_notifier_block, >> CPUFREQ_POLICY_NOTIFIER); >> } >> mutex_unlock(_cpufreq_lock); >> - thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > > Why did you move the call to thermal_cooling_device_unregister() from > here? I don't see any reason for moving it. In common sense, usually unregister first and then count--; But here it should be opposite sequence of cpufreq_cooling_register, will update it. > >> + >> release_idr(_idr, cpufreq_dev->id); >> - if (cpufreq_dev_count == 1) >> - mutex_destroy(_cpufreq_lock); >> kfree(cpufreq_dev); >> } >> EXPORT_SYMBOL(cpufreq_cooling_unregister); >> -- >> 1.7.11.3 > > -- > Francesco -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()
On Thu, Oct 25, 2012 at 7:30 PM, Rik van Riel wrote: >> >> LOOK at the code, for chrissake. Just look at it. And if you don't see >> why the above is stupid and retarded, you damn well shouldn't be >> touching VM code. > > I agree it is pretty ugly. However, the above patch > did get rid of a gigantic performance regression with > Peter's code. Rik, *LOOK* at the code like I asked you to, instead of making excuses for it. I'm not necessarily arguing with what the code tries to do. I'm arguing with the fact that the code is pure and utter *garbage*. It has two major (and I mean *MAJOR*) problems, both of which individually should make you ashamed for ever posting that piece of shit: The obvious-without-even-understanding-semantics problem: - it's humongously stupidly written. It calculates that 'flush_remote' flag WHETHER IT GETS USED OR NOT. Christ. I can kind of expect stuff like that in driver code etc, but in VM routines? Yes, the compiler may be smart enough to actually fix up the idiocy. That doesn't make it less stupid. The more-subtle-but-fundamental-problem: - regardless of how stupidly written it is on a very superficial level, it's even more stupid in a much more fundamental way. That whole routine is explicitly written to be opportunistic. It is *documented* to only set the access flags, so comparing anything else is stupid, wouldn't you say? Documented where? It's actually explicitly documented in the pgtable-generic.c file which has the generic implementation of that thing. But it's implicitly documented both in the name of the function (do take another look) *and* in the actual implementation of the function. Look at the code: it doesn't even always update the page tables AT ALL (and no, the return value does *not* reflect whether it updated it or not!) Also, notice how we update the pte entry with a simple *ptep = entry; statement, not with the usual expensive page table updates? The only thing that makes this safe is that we *only* do it with the exact same page frame number (anything else would be disastrously buggy on 32-bit PAE, for example). And we only ever do it with the dirty bit always set, because otherwise we might be silently dropping a concurrent hardware update of the dirty bit of the previous pte value on another CPU. The latter requirement is why the x86 code does if (changed && dirty) { while the generic code checks just "If (changed)" (and then uses the much more expensive set_pte_at() that has the proper dirty-bit guarantees, and generates atomic accesses, not to mention various virtualization crap). In other words, everything that was added by that patch is PURE AND UTTER SHIT. And THAT is what I'm objecting to. Guess what? If you want to optimize the function to not do remote TLB flushes, then just do that! None of the garbage. Just change the flush_tlb_page(vma, address); line to __flush_tlb_one(address); and it should damn well work. Because everything I see about "flush_remote" looks just wrong, wrong, wrong. And if there really is some reason for that whole flush_remote braindamage, then we have much bigger problems, namely the fact that we've broken the documented semantics of that function, and we're doing various other things that are completely and utterly invalid unless the above semantics hold. So that patch should be burned, and possibly used as an example of horribly crappy code for later generations. At no point should it be applied. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
From: Yanmin Zhang We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != HRTIMER_STATE_CALLBACK). <2>[ 10.226053, 3] kernel BUG at /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228! <0>[ 10.235682, 3] invalid opcode: [#1] PREEMPT SMP <4>[ 10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C) <4>[ 10.251651, 3] <4>[ 10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC 3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD <4>[ 10.264674, 3] EIP: 0060:[] EFLAGS: 00010002 CPU: 3 <4>[ 10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240 <4>[ 10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301 <4>[ 10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8 <4>[ 10.288113, 3] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 <0>[ 10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 task.ti=f57b4000) <0>[ 10.301827, 3] Stack: <4>[ 10.304083, 3] c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 f67fb6b8 f57b5e14 <4>[ 10.312069, 3] 0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 f57b5e08 c126f325 <4>[ 10.320055, 3] 86b9868d 0001 86b9868d 0001 0003 7fff <0>[ 10.328041, 3] Call Trace: <4>[ 10.330742, 3] [] ? gburst_thread_stop.isra.25+0x40/0x40 <4>[ 10.336988, 3] [] hrtimer_interrupt+0xd5/0x250 <4>[ 10.342368, 3] [] ? sched_clock_cpu+0xe5/0x150 <4>[ 10.347753, 3] [] smp_apic_timer_interrupt+0x54/0x88 <4>[ 10.353654, 3] [] ? trace_hardirqs_off_thunk+0xc/0x14 <4>[ 10.359643, 3] [] apic_timer_interrupt+0x2f/0x34 <4>[ 10.365199, 3] [] ? sub_preempt_count+0x1f/0x50 <4>[ 10.370669, 3] [] delay_tsc+0x3a/0xc0 <6>[ 10.371589, 0] android_work: did not send uevent (0 0 (null)) <4>[ 10.381171, 3] [] __const_udelay+0x23/0x30 <4>[ 10.386207, 3] [] mdfld_dsi_send_dcs+0x12a/0x5d0 <4>[ 10.391760, 3] [] ? _raw_spin_unlock_irqrestore+0x26/0x50 <4>[ 10.398101, 3] [] ? ospm_power_using_hw_begin+0xa1/0x350 <4>[ 10.399053, 3] [] ? __mutex_lock_slowpath+0x1ff/0x2f0 <4>[ 10.399069, 3] [] mdfld_dbi_update_panel+0x21e/0x2d0 <4>[ 10.399085, 3] [] mdfld_te_handler_work+0x71/0x80 <4>[ 10.399099, 3] [] process_one_work+0xfe/0x3f0 <4>[ 10.399114, 3] [] ? mdfld_async_flip_te_handler+0xf0/0xf0 Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer calls the timer callback fn, another thread might call enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking fails. The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK. Signed-off-by: Yanmin Zhang Reviewed-by: He, Bo --- kernel/hrtimer.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 6db7a5e..6280184 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now) * hrtimer_start_range_ns() or in hrtimer_interrupt() */ if (restart != HRTIMER_NORESTART) { - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK); + BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK)); enqueue_hrtimer(timer, base); } -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader
于 2012年10月26日 10:45, Greg KH 写道: On Fri, Oct 26, 2012 at 09:10:45AM +0800, wwang wrote: 于 2012年10月26日 02:50, Greg KH 写道: On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote: On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote: We are still maintaining the SCSI driver for Realtek card reader, and will release the latest source code in the Github in the future. But maybe we won't push it to the staging tree any more. Maybe we should just remove the staging code if it won't be fixed. That's sort of the point of staging. I agree. wwang, want me to delet the staging driver now? I don't want "dead" code in the tree, especially as you don't want to maintain it anymore. thanks, greg k-h Hi Greg: OK. You can delete it now, please. And I will push my new driver to MFD and MMC subsystem. Ok, just to confirm, you want me to delete the drivers/staging/rts_pstor/* code, right? thanks, greg k-h Hi Greg: Yes, you can delete drivers/staging/rts_pstor BR, wwang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader
On Fri, Oct 26, 2012 at 09:10:45AM +0800, wwang wrote: > 于 2012年10月26日 02:50, Greg KH 写道: > > On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote: > >> On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote: > >>> We are still maintaining the SCSI driver for Realtek card reader, > >>> and will release the latest source code in the Github in the future. > >>> But maybe we won't push it to the staging tree any more. > >> Maybe we should just remove the staging code if it won't be fixed. > >> That's sort of the point of staging. > > I agree. wwang, want me to delet the staging driver now? I don't want > > "dead" code in the tree, especially as you don't want to maintain it > > anymore. > > > > thanks, > > > > greg k-h > Hi Greg: > > OK. You can delete it now, please. > And I will push my new driver to MFD and MMC subsystem. Ok, just to confirm, you want me to delete the drivers/staging/rts_pstor/* code, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: process hangs on do_exit when oom happens
On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko wrote: > On Wed 24-10-12 11:44:17, Qiang Gao wrote: >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >> >>> >> This process was moved to RT-priority queue when global oom-killer >> >>> >> happened to boost the recovery of the system.. >> >>> > >> >>> > Who did that? oom killer doesn't boost the priority (scheduling class) >> >>> > AFAIK. >> >>> > >> >>> >> but it wasn't get properily dealt with. I still have no idea why where >> >>> >> the problem is .. >> >>> > >> >>> > Well your configuration says that there is no runtime reserved for the >> >>> > group. >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more >> >>> > information. >> >>> > >> >> [...] >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel >> >>> would boost the process to RT prio when the process was selected >> >>> by oom-killer. >> >> >> >> This still looks like your cpu controller is misconfigured. Even if the >> >> task is promoted to be realtime. >> > >> > >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run, >> > as a workaround please give the groups some RT bandwidth and then work >> > out the migration to RT and what should be the defaults on the distro. >> > >> > Balbir >> >> >> see https://patchwork.kernel.org/patch/719411/ > > The patch surely "fixes" your problem but the primary fault here is the > mis-configured cpu cgroup. If the value for the bandwidth is zero by > default then all realtime processes in the group a screwed. The value > should be set to something more reasonable. > I am not familiar with the cpu controller but it seems that > alloc_rt_sched_group needs some treat. Care to look into it and send a > patch to the cpu controller and cgroup maintainers, please? > > -- > Michal Hocko > SUSE Labs I'm trying to fix the problem. but no substantive progress yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kdump with signed images
On Thu, Oct 25, 2012 at 09:15:58PM -0400, Mimi Zohar wrote: > On a running system, the package installer, after verifying the package > integrity, would install each file with the associated 'security.ima' > extended attribute. The 'security.evm' digital signature would be > installed with an HMAC, calculated using a system unique key. The idea isn't to prevent /sbin/kexec from being modified after installation - it's to prevent it from being possible to install a system that has a modified /sbin/kexec. Leaving any part of this up to the package installer means that it doesn't solve the problem we're trying to solve here. It must be impossible for the kernel to launch any /sbin/kexec that hasn't been signed by a trusted key that's been built into the kernel, and it must be impossible for anything other than /sbin/kexec to make the kexec system call. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/13] sched: Integrating Per-entity-load-tracking with the core scheduler
The benchmark: /* * test.c - Simulate workloads that load the CPU differently * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; version 2 of the License. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 * USA */ /* * This workload spawns threads which request for allocation of a * memory chunk,write to it and free it.The duty cycle of these threads * can be varied.The idea is to simulate tasks which load the cpu * to different extents. */ #include #include #include #include #include #include #include #include #include #include "malloc.h" /* Variable entities */ static unsigned int seconds; static unsigned int threads; static unsigned int mem_chunk_size; static unsigned int sleep_at; static unsigned int sleep_interval; typedef size_t mem_slot_t;/* 8 bytes */ static unsigned int slot_size = sizeof(mem_slot_t); /* Other parameters */ static volatile int start; static time_t start_time; static unsigned int records_read; pthread_mutex_t records_count_lock = PTHREAD_MUTEX_INITIALIZER; static unsigned int write_to_mem(void) { int i, j; mem_slot_t *scratch_pad, *temp; mem_chunk_size = slot_size * 256; mem_slot_t *end; /* The below two parameters ensure that it is 10% workload * with a duty cycle of 10ms.The number of records read in * 1s without sleep was observed and appropriately calculated * for 1ms.This number turned out to be 1228. */ sleep_at = 1228; /* sleep for every 1228 records */ sleep_interval = 9000; /* sleep for 9 ms */ for (i=0; start == 1; i++) { scratch_pad = (mem_slot_t *)malloc(mem_chunk_size); if (scratch_pad == NULL) { fprintf(stderr,"Could not allocate memory\n"); exit(1); } end = scratch_pad + (mem_chunk_size / slot_size); for (temp = scratch_pad, j=0; temp < end; temp++, j++) *temp = (mem_slot_t)j; free(scratch_pad); if (sleep_at && !(i % sleep_at)) usleep(sleep_interval); } return (i); } static void * thread_run(void *arg) { unsigned int records_local; /* Wait for the start signal */ while (start == 0); records_local = write_to_mem(); pthread_mutex_lock(_count_lock); records_read += records_local; pthread_mutex_unlock(_count_lock); return NULL; } static void start_threads() { double diff_time; unsigned int i; int err; threads = 8; seconds = 10; pthread_t thread_array[threads]; for (i = 0; i < threads; i++) { err = pthread_create(_array[i], NULL, thread_run, NULL); if (err) { fprintf(stderr, "Error creating thread %d\n", i); exit(1); } } start_time = time(NULL); start = 1; sleep(seconds); start = 0; diff_time = difftime(time(NULL), start_time); for (i = 0; i < threads; i++) { err = pthread_join(thread_array[i], NULL); if (err) { fprintf(stderr, "Error joining thread %d\n", i); exit(1); } } printf("%u records/s\n", (unsigned int) (((double) records_read)/diff_time)); } int main() { start_threads(); return 0; } Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] firmware loader: introduce module parameter to customize fw search path
On Thu, Oct 25, 2012 at 5:46 PM, Ming Lei wrote: > struct file *file; > - snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); > + > + if (i < 0) { > + if (!fw_path_para[0]) /* No customized path */ > + continue; > + snprintf(path, PATH_MAX, "%s/%s", fw_path_para, > +buf->fw_id); > + } else { > + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], > +buf->fw_id); > + } Ugh. This is just disgusting. Please just make "fw_path[0]" just be the pointer to fw_path_para[] (which sounds like the cleanest fix) and get rid of the negative 'i' and conditional entirely. Or if there is some odd reason you don't want to do that, at least make the conditional much smaller, without the snprintf() in both arms (ie make the if-statement just set a "const char *dir" variable to either fw_path[i] or fw_path_para or whatever). Sure, the compiler *may* merge them (gcc does, but I've seen it miss them too), but even if the compiler might fix up ugly code, that's not a reason for it to be ugly in the source code anyway. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page
On Thu, Oct 25, 2012 at 02:08:14AM -0700, Anton Vorontsov wrote: > Hello Minchan, > > Thanks a lot for the email! > > On Thu, Oct 25, 2012 at 03:40:09PM +0900, Minchan Kim wrote: > [...] > > > What applications (well, activity managers) are really interested in is > > > this: > > > > > > 1. Do we we sacrifice resources for new memory allocations (e.g. files > > >cache)? > > > 2. Does the new memory allocations' cost becomes too high, and the system > > >hurts because of this? > > > 3. Are we about to OOM soon? > > > > Good but I think 3 is never easy. > > But early notification would be better than late notification which can kill > > someone. > > Well, basically these are two fixed (strictly defined) levels (low and > oom) + one flexible level (med), which meaning can be slightly tuned (but > we still have a meaningful definition for it). > I mean detection of "3) Are we about to OOM soon" isn't easy. > So, I guess it's a good start. :) Absolutely! > > > > And here are the answers: > > > > > > 1. VMEVENT_PRESSURE_LOW > > > 2. VMEVENT_PRESSURE_MED > > > 3. VMEVENT_PRESSURE_OOM > > > > > > There is no "high" pressure, since I really don't see any definition of > > > it, but it's possible to introduce new levels without breaking ABI. The > > > levels described in more details in the patches, and the stuff is still > > > tunable, but now via sysctls, not the vmevent_fd() call itself (i.e. we > > > don't need to rebuild applications to adjust window size or other mm > > > "details"). > > > > > > What I couldn't fix in this RFC is making vmevent_{scanned,reclaimed} > > > stuff per-CPU (there's a comment describing the problem with this). But I > > > made it lockless and tried to make it very lightweight (plus I moved the > > > vmevent_pressure() call to a more "cold" path). > > > > Your description doesn't include why we need new vmevent_fd(2). > > Of course, it's very flexible and potential to add new VM knob easily but > > the thing we is about to use now is only VMEVENT_ATTR_PRESSURE. > > Is there any other use cases for swap or free? or potential user? > > Number of idle pages by itself might be not that interesting, but > cache+idle level is quite interesting. > > By definition, _MED happens when performance already degraded, slightly, > but still -- we can be swapping. > > But _LOW notifications are coming when kernel is just reclaiming, so by > using _LOW notifications + watching for cache level we can very easily > predict the swapping activity long before we have even _MED pressure. So, for seeing cache level, we need new vmevent_attr? > > E.g. if idle+cache drops below amount of memory that userland can free, > we'd indeed like to start freeing stuff (this somewhat resembles current > logic that we have in the in-kernel LMK). > > Sure, we can read and parse /proc/vmstat upon _LOW events (and that was my > backup plan), but reporting stuff together would make things much nicer. My concern is that user can imagine various scenario with vmstat and they might start to require new vmevent_attr in future and vmevent_fd will be bloated and mm guys should care of vmevent_vd whenever they add new vmstat. I don't like it. User can do it by just reading /proc/vmstat. So I support your backup plan. > > Although, I somewhat doubt that it is OK to report raw numbers, so this > needs some thinking to develop more elegant solution. Indeed. > > Maybe it makes sense to implement something like PRESSURE_MILD with an > additional nr_pages threshold, which basically hits the kernel about how > many easily reclaimable pages userland has (that would be a part of our > definition for the mild pressure level). So, essentially it will be > > if (pressure_index >= oom_level) > return PRESSURE_OOM; > else if (pressure_index >= med_level) > return PRESSURE_MEDIUM; > else if (userland_reclaimable_pages >= nr_reclaimable_pages) > return PRESSURE_MILD; > return PRESSURE_LOW; > > I must admit I like the idea more than exposing NR_FREE and stuff, but the > scheme reminds me the blended attributes, which we abandoned. Although, > the definition sounds better now, and we seem to be doing it in the right > place. > > And if we go this way, then sure, we won't need any other attributes, and > so we could make the API much simpler. That's what I want! If there isn't any user who really are willing to use it, let's drop it. Do not persuade with imaginary scenario because we should be careful to introduce new ABI. > > > Adding vmevent_fd without them is rather overkill. > > > > And I want to avoid timer-base polling of vmevent if possbile. > > mem_notify of KOSAKI doesn't use such timer. > > For pressure notifications we don't use the timers. We also read the Hmm, when I see the code, timer still works and can notify to user. No? > vmstat counters together with the pressure, so "pressure + counters" > effectively turns it
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On 10/26/2012 09:27 AM, Fengguang Wu wrote: On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote: On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: Hi Chen, But how can bdi related ra_pages reflect different files' readahead window? Maybe these different files are sequential read, random read and so on. It's simple: sequential reads will get ra_pages readahead size while random reads will not get readahead at all. Talking about the below chunk, it might hurt someone that explicitly takes advantage of the behavior, however the ra_pages*2 seems more like a hack than general solution to me: if the user will need POSIX_FADV_SEQUENTIAL to double the max readahead window size for improving IO performance, then why not just increase bdi->ra_pages and benefit all reads? One may argue that it offers some differential behavior to specific applications, however it may also present as a counter-optimization: if the root already tuned bdi->ra_pages to the optimal size, the doubled readahead size will only cost more memory and perhaps IO latency. --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) spin_unlock(>f_lock); break; case POSIX_FADV_SEQUENTIAL: - file->f_ra.ra_pages = bdi->ra_pages * 2; I think we really have to reset file->f_ra.ra_pages here as it is not a set-and-forget value. e.g. shrink_readahead_size_eio() can reduce ra_pages as a result of IO errors. Hence if you have had io errors, telling the kernel that you are now going to do sequential IO should reset the readahead to the maximum ra_pages value supported Good point! but wait this patch removes file->f_ra.ra_pages in all other places too, so there will be no file->f_ra.ra_pages to be reset here... In his patch, static void shrink_readahead_size_eio(struct file *filp, struct file_ra_state *ra) { - ra->ra_pages /= 4; + spin_lock(>f_lock); + filp->f_mode |= FMODE_RANDOM; + spin_unlock(>f_lock); As the example in comment above this function, the read maybe still sequential, and it will waste IO bandwith if modify to FMODE_RANDOM directly. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()
On 10/25/2012 04:17 PM, Linus Torvalds wrote: On Thu, Oct 25, 2012 at 5:16 AM, Peter Zijlstra wrote: From: Rik van Riel @@ -306,11 +306,26 @@ int ptep_set_access_flags(struct vm_area pte_t entry, int dirty) { int changed = !pte_same(*ptep, entry); + /* +* If the page used to be inaccessible (_PAGE_PROTNONE), or +* this call upgrades the access permissions on the same page, +* it is safe to skip the remote TLB flush. +*/ + bool flush_remote = false; + if (!pte_accessible(*ptep)) + flush_remote = false; + else if (pte_pfn(*ptep) != pte_pfn(entry) || + (pte_write(*ptep) && !pte_write(entry)) || + (pte_exec(*ptep) && !pte_exec(entry))) + flush_remote = true; if (changed && dirty) { Did anybody ever actually look at this sh*t-for-brains patch? Yeah, I'm grumpy. But I'm wasting time looking at patches that have new code in them that is stupid and retarded. This is the VM, guys, we don't add stupid and retarded code. LOOK at the code, for chrissake. Just look at it. And if you don't see why the above is stupid and retarded, you damn well shouldn't be touching VM code. I agree it is pretty ugly. However, the above patch did get rid of a gigantic performance regression with Peter's code. Doing unnecessary remote TLB flushes was costing about 90% performance with specjbb on a 4 node system. However, if we can guarantee that ptep_set_access_flags is only ever called for pte permission _upgrades_, we can simply get rid of the remote TLB flush on x86, and skip the paranoia tests we are doing above. Do we have that kind of guarantee? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The idea about scheduler test module(STM)
Yes, it's a new way to do scheduler test. But why use kernel threads? The info, total time, run time, wait time, preempt number, all can be collected from tasks' sched info from /proc/pid/sched and /proc/pid/stat. I don't understand clearly about "pure scheduler performance" here. In order to test scheduler fully, we need to do IO test, and other subsystems will be involved in. But if other subsystems' code are not changed, the result still can refer to scheduler's change. I can't think much farther about the advantage this way could give. Maybe you should show us ur better examples. :) Regards, Charles On 10/25/2012 01:40 PM, Michael Wang wrote: Hi, Folks Charles has raised a problem that we don't have any tool yet for testing the scheduler with out any disturb from other subsystem, and I also found it's hard to test scheduler optimize patch, since the improvement could be easily eaten by other subsystem like IO. So Let's check the tools we have currently: 1. perf sched we can use it to trace the threads we interested, and the info it provided is very good, but one issue is, it could not create the workload we want, also collect the info and do summary is not so easy. 2. linsched It's a very good tool to create the test environment, but it's implementation is to ideal, so it could not present the real world problem. Since both perf and linsched could not meet our requirement, we decided to develop a new tool, let's currently call it scheduler test module(STM). It's propose is: 1. create the workload we want. 2. test the pure scheduler. 3. collect info we need and do summary. This tool should be very easy to use and not depends on the implementation of scheduler. We can use it to check the pure scheduler performance on our system. We can use it to check whether there are regression in scheduler when testing patches. And other usage I've not figure out yet. In order to explain the idea more directly, I have wrote a prototype STM, it's a separate module, and you can use it just like 'rcutorture'. I attached a small script 'play.sh' to help you easily run the test, put 'schedtm.c' and 'play.sh' in same directory and run 'play.sh', you will see out put like: schedtm: summary schedtm:cpu count: cpu run preempt schedtm:0 13811381 schedtm:1 957 955 schedtm:2 900 900 schedtm:3 10351034 schedtm:4 991 990 schedtm:5 940 939 schedtm:6 900 897 schedtm:7 942 948 schedtm:8 852 850 schedtm:9 931 938 schedtm:10 936 934 schedtm:11 951 950 schedtm:total time(us): 10138172 schedtm:run time(us): 5055223(49.86%) schedtm:wait time(us): 5082949 schedtm:latency(us):10489 schedtm: stmt22 got highest run time 5604941(+10%) schedtm: stmt3 got lowest run time 4852057(-4%) schedtm: stmt12 got highest latency 11482(+9%) schedtm: stmt0 got lowest latency 7561(-27%) And you can enable/disable CONFIG_PREEMPT to see magnificent change on latency. This is nothing but a demo, and please "RUN IT ON A TEST MACHINE"... It will create 24 kernel threads and run 10 seconds, you can change it by module param. I will be appreciate if I could get some feedback from the scheduler experts like you, whatever you think it's good or junk, please let me know :) Regards, Michael Wang play.sh: DURATION=10 NORMAL_THREADS=24 PERIOD=10 make clean make insmod ./schedtm.ko normalnr=$NORMAL_THREADS period=$PERIOD sleep $DURATION rmmod ./schedtm.ko dmesg | grep schedtm schedtm.c: /* * scheduler test module * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2012 * * Authors: Michael Wang * */ #include #include #include MODULE_LICENSE("GPL"); MODULE_AUTHOR("Michael Wang "); #define pr_schedtm(fmt, ...)\ do {\
Re: [ 08/31] use clamp_t in UNAME26 fix
On Thu, Oct 25, 2012 at 05:11:19PM -0700, Jonathan Nieder wrote: > Hi, > > Greg Kroah-Hartman wrote: > > > commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream. > > > > The min/max call needed to have explicit types on some architectures > > (e.g. mn10300). Use clamp_t instead to avoid the warning: > > > > kernel/sys.c: In function 'override_release': > > kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks > > a cast [enabled by default] > > > > Reported-by: Fengguang Wu > > Signed-off-by: Kees Cook > > Signed-off-by: Linus Torvalds > > Signed-off-by: Greg Kroah-Hartman > [...] > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > @@ -1152,7 +1152,7 @@ static int override_release(char __user > > rest++; > > } > > v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40; > > - copy = min(sizeof(buf), max_t(size_t, 1, len)); > > + copy = clamp_t(size_t, len, 1, sizeof(buf)); > > copy = scnprintf(buf, copy, "2.6.%u%s", v, rest); > > Does this have any effect at runtime? If not, why is it needed for > stable kernels? It's a bugfix for the previous patch in this area, fixing the build warning. I don't like adding stable patches that add new warnings :) thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On 10/26/2012 08:25 AM, Dave Chinner wrote: On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: Hi Chen, But how can bdi related ra_pages reflect different files' readahead window? Maybe these different files are sequential read, random read and so on. It's simple: sequential reads will get ra_pages readahead size while random reads will not get readahead at all. Talking about the below chunk, it might hurt someone that explicitly takes advantage of the behavior, however the ra_pages*2 seems more like a hack than general solution to me: if the user will need POSIX_FADV_SEQUENTIAL to double the max readahead window size for improving IO performance, then why not just increase bdi->ra_pages and benefit all reads? One may argue that it offers some differential behavior to specific applications, however it may also present as a counter-optimization: if the root already tuned bdi->ra_pages to the optimal size, the doubled readahead size will only cost more memory and perhaps IO latency. --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) spin_unlock(>f_lock); break; case POSIX_FADV_SEQUENTIAL: - file->f_ra.ra_pages = bdi->ra_pages * 2; I think we really have to reset file->f_ra.ra_pages here as it is not a set-and-forget value. e.g. shrink_readahead_size_eio() can reduce ra_pages as a result of IO errors. Hence if you have had io errors, telling the kernel that you are now going to do sequential IO should reset the readahead to the maximum ra_pages value supported Good catch! Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
On 10/26/2012 05:48 AM, Hugh Dickins wrote: On Thu, 25 Oct 2012, Johannes Weiner wrote: On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: On Wed, 24 Oct 2012, Dave Jones wrote: Machine under significant load (4gb memory used, swap usage fluctuating) triggered this... WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 1148 error = shmem_add_to_page_cache(page, mapping, index, 1149 gfp, swp_to_radix_entry(swap)); 1150 /* We already confirmed swap, and make no allocation */ 1151 VM_BUG_ON(error); 1152 } That's very surprising. Easy enough to handle an error there, but of course I made it a VM_BUG_ON because it violates my assumptions: I rather need to understand how this can be, and I've no idea. Could it be concurrent truncation clearing out the entry between shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see anything preventing that. The empty slot would not match the expected swap entry this call passes in and the returned error would be -ENOENT. Excellent notion, many thanks Hannes, I believe you've got it. I've hit that truncation problem in swapoff (and commented on it in shmem_unuse_inode), but never hit it or considered it here. I think of the page lock as holding it stable, but truncation's free_swap_and_cache only does a trylock on the swapcache page, so we're not secured against that possibility. Hi Hugh, Even though free_swap_and_cache only does a trylock on the swapcache page, but it doens't call delete_from_swap_cache and the associated entry should still be there, I am interested in what you have already introduce to protect it? So I'd like to change it to VM_BUG_ON(error && error != -ENOENT), but there's a little tidying up to do in the -ENOENT case, which Do you mean radix_tree_insert will return -ENOENT if the associated entry is not present? Why I can't find this return value in the function radix_tree_insert? needs more thought. A delete_from_swap_cache(page) - though we can be lazy and leave that to reclaim for such a rare occurrence - and probably a mem_cgroup uncharge; but the memcg hooks are always the hardest to get right, I'll have think about that one carefully. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]
On 10/25/2012 12:17 PM, KOSAKI Motohiro wrote: > On Wed, Oct 24, 2012 at 5:43 AM, Lai Jiangshan wrote: >> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], >> it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause >> node_states[N_NORMAL_MEMORY] becomes incorrect. >> >> Example, if a node is empty before online, and we online a memory >> which is in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY] >> is correct, but node_states[N_NORMAL_MEMORY] is incorrect, >> the online code don't set the new online node to >> node_states[N_NORMAL_MEMORY]. >> >> The same things like it will happen when offline(the offline code >> don't clear the node from node_states[N_NORMAL_MEMORY] when needed). >> Some memory managment code depends node_states[N_NORMAL_MEMORY], >> so we have to fix up the node_states[N_NORMAL_MEMORY]. >> >> We add node_states_check_changes_online() and >> node_states_check_changes_offline() >> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] >> are changed while hotpluging. >> >> Also add @status_change_nid_normal to struct memory_notify, thus >> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] >> are changed. (We can add a @flags and reuse @status_change_nid instead of >> introducing @status_change_nid_normal, but it will add much more complicated >> in memory hotplug callback in every subsystem. So introdcing >> @status_change_nid_normal is better and it don't change the sematic >> of @status_change_nid) >> >> Changed from V1: >> add more comments >> change the function name > > Your patch didn't fix my previous comments and don't works correctly. > Please test your own patch before resubmitting. You should consider both > zone normal only node and zone high only node. > The comments in the code already answered/explained your previous comments. Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kfifo: remove unnecessary type check
From: Yuanhan Liu Firstly, this kind of type check doesn't work. It does something similay like following: void * __dummy = NULL; __buf = __dummy; __dummy is defined as void *. Thus it will not trigger warnings as expected. Second, we don't need that kind of check. Since the prototype of __kfifo_out is: unsigned int __kfifo_out(struct __kfifo *fifo, void *buf, unsigned int len) buf is defined as void *, so we don't need do the type check. Remove it. LINK: https://lkml.org/lkml/2012/10/25/386 LINK: https://lkml.org/lkml/2012/10/25/584 Cc: Andrew Morton Cc: Wei Yang Cc: Stefani Seibold Cc: Fengguang Wu Cc: Stephen Rothwell Signed-off-by: Yuanhan Liu --- include/linux/kfifo.h | 20 1 file changed, 20 deletions(-) diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h index 10308c6..b8c1d03 100644 --- a/include/linux/kfifo.h +++ b/include/linux/kfifo.h @@ -390,10 +390,6 @@ __kfifo_int_must_check_helper( \ unsigned int __ret; \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) { \ - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \ - __dummy = (typeof(__val))NULL; \ - } \ if (__recsize) \ __ret = __kfifo_in_r(__kfifo, __val, sizeof(*__val), \ __recsize); \ @@ -432,8 +428,6 @@ __kfifo_uint_must_check_helper( \ unsigned int __ret; \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) \ - __val = (typeof(__tmp->ptr))0; \ if (__recsize) \ __ret = __kfifo_out_r(__kfifo, __val, sizeof(*__val), \ __recsize); \ @@ -473,8 +467,6 @@ __kfifo_uint_must_check_helper( \ unsigned int __ret; \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) \ - __val = (typeof(__tmp->ptr))NULL; \ if (__recsize) \ __ret = __kfifo_out_peek_r(__kfifo, __val, sizeof(*__val), \ __recsize); \ @@ -512,10 +504,6 @@ __kfifo_uint_must_check_helper( \ unsigned long __n = (n); \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) { \ - typeof(__tmp->ptr_const) __dummy __attribute__ ((unused)); \ - __dummy = (typeof(__buf))NULL; \ - } \ (__recsize) ?\ __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \ __kfifo_in(__kfifo, __buf, __n); \ @@ -565,10 +553,6 @@ __kfifo_uint_must_check_helper( \ unsigned long __n = (n); \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) { \ - typeof(__tmp->ptr) __dummy = NULL; \ - __buf = __dummy; \ - } \ (__recsize) ?\ __kfifo_out_r(__kfifo, __buf, __n, __recsize) : \ __kfifo_out(__kfifo, __buf, __n); \ @@ -777,10 +761,6 @@ __kfifo_uint_must_check_helper( \ unsigned long __n = (n); \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ - if (0) { \ - typeof(__tmp->ptr) __dummy __attribute__ ((unused)) = NULL; \ - __buf = __dummy; \ - } \ (__recsize) ? \ __kfifo_out_peek_r(__kfifo, __buf, __n, __recsize) : \ __kfifo_out_peek(__kfifo, __buf, __n); \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
On 10/26/2012 05:27 AM, Hugh Dickins wrote: On Thu, 25 Oct 2012, Ni zhan Chen wrote: On 10/25/2012 02:59 PM, Hugh Dickins wrote: On Thu, 25 Oct 2012, Ni zhan Chen wrote: I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative rss in memcg memory.stat], one question: Well, yes, I added the VM_BUG_ON in that commit. if function shmem_confirm_swap confirm the entry has already brought back from swap by a racing thread, The reverse: true confirms that the swap entry has not been brought back from swap by a racing thread; false indicates that there has been a race. then why call shmem_add_to_page_cache to add page from swapcache to pagecache again? Adding it to pagecache again, after such a race, would set error to -EEXIST (originating from radix_tree_insert); but we don't do that, we add it to pagecache when it has not already been added. Or that's the intention: but Dave seems to have found an unexpected exception, despite us holding the page lock across all this. (But if it weren't for the memcg and replace_page issues, I'd much prefer to let shmem_add_to_page_cache discover the race as before.) Hugh Hi Hugh Thanks for your response. You mean the -EEXIST originating from radix_tree_insert, in radix_tree_insert: if (slot != NULL) return -EEXIST; But why slot should be NULL? if no race, the pagecache related radix tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? I was describing what would happen in a case that should not exist, that you had thought the common case. In actuality, the entry should not be NULL, it should be as you say there. Thanks for your patience. So in the common case, the entry should be the value I mentioned, then why has this check? if (slot != NULL) return -EEXIST; the common case will return -EEXIST. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] dm: stay in blk_queue_bypass until queue becomes initialized
On 10/25/12 18:41, Jun'ichi Nomura wrote: > With 749fefe677 ("block: lift the initial queue bypass mode on > blk_register_queue() instead of blk_init_allocated_queue()"), > add_disk() eventually calls blk_queue_bypass_end(). > This change invokes the following warning when multipath is used. ... > The warning means during queue initialization blk_queue_bypass_start() > calls sleeping function (synchronize_rcu) while dm holds md->type_lock. > > dm device initialization basically includes the following 3 steps: > 1. create ioctl, allocates queue and call add_disk() > 2. table load ioctl, determines device type and initialize queue > if request-based > 3. resume ioctl, device becomes functional > > So it is better to have dm's queue stay in bypass mode until > the initialization completes in table load ioctl. > > The effect of additional blk_queue_bypass_start(): > > 3.7-rc2 (plain) > # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \ > dmsetup remove a; done > > real 0m15.434s > user 0m0.423s > sys 0m7.052s > > 3.7-rc2 (with this patch) > # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \ > dmsetup remove a; done > real 0m19.766s > user 0m0.442s > sys 0m6.861s > > If this additional cost is not negligible, we need a variant of add_disk() > that does not end bypassing. Or call blk_queue_bypass_start() before add_disk(): diff --git a/drivers/md/dm.c b/drivers/md/dm.c index ad02761..d14639b 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1868,9 +1868,9 @@ static struct mapped_device *alloc_dev(int minor) md->disk->queue = md->queue; md->disk->private_data = md; sprintf(md->disk->disk_name, "dm-%d", minor); - add_disk(md->disk); /* Until md type is determined, put the queue in bypass mode */ blk_queue_bypass_start(md->queue); + add_disk(md->disk); format_dev_t(md->name, MKDEV(_major, minor)); md->wq = alloc_workqueue("kdmflush", --- If the patch is modified like above, we could fix the issue without incurring additional cost on dm device creation. # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \ dmsetup remove a; done real0m15.684s user0m0.404s sys 0m7.181s -- Jun'ichi Nomura, NEC Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 4/5] xen: arm: implement remap interfaces needed for privcmd mappings.
On Thu, 25 Oct 2012 08:46:59 +0100 Ian Campbell wrote: > On Thu, 2012-10-25 at 01:07 +0100, Mukesh Rathor wrote: > > On Wed, 24 Oct 2012 16:44:11 -0700 > > Mukesh Rathor wrote: > > > > > > > > > > +/* Indexes into space being mapped. */ > > > > +GUEST_HANDLE(xen_ulong_t) idxs; > > > > + > > > > +/* GPFN in domid where the source mapping page should > > > > appear. */ > > > > +GUEST_HANDLE(xen_pfn_t) gpfns; > > > > > > > > > Looking at your arm implementation in xen, doesn't look like you > > > are expecting idxs and gpfns to be contigous. In that case, > > > shouldn't idxs and gpfns be pointers, ie, they are sent down as > > > arrays? Or does GUEST_HANDLE do that, I can't seem to find where > > > it's defined quickly. > > > > Never mind, I see it got corrected to XEN_GUEST_HANDLE in staging > > tree. > > The macro is called XEN_GUEST_HANDLE in Xen and just GUEST_HANDLE in > Linux. > > > Still doesn't compile tho: > > > > public/memory.h:246: error: expected specifier-qualifier-list before > > ‘__guest_handle_xen_ulong_t’ > > > > I'll figure it out. > > Looks like you've got it all sorted? Yup. I made the change on xen side and added this patch to my tree and got it working after reverting Konrad's setup.c changes. Not sure if you need an ack from x86, but if you do: Acked-by: Mukesh Rathor thanks Mukesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 5/5] xen: x86 pvh: use XENMEM_add_to_physmap_range for foreign gmfn mappings
On Wed, 24 Oct 2012 14:19:37 +0100 Ian Campbell wrote: > Squeezing the necessary fields into the existing XENMEM_add_to_physmap > interface was proving to be a bit tricky so we have decided to go with > a new interface upstream (the XENMAPSPACE_gmfn_foreign interface using > XENMEM_add_to_physmap was never committed anywhere). This interface > also allows for batching which was impossible to support at the same > time as foreign mfns in the old interface. > > This reverts the relevant parts of "PVH: basic and header changes, > elfnote changes, ..." and followups and trivially converts > pvh_add_to_xen_p2m over. > > Signed-off-by: Ian Campbell > Acked-by: Stefano Stabellini Ok, I made the change on the xen side for x86 and tested it out. Works fine. Second ack. thanks, Mukesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state
On Fri, Oct 26, 2012 at 11:25:44AM +1100, Dave Chinner wrote: > On Thu, Oct 25, 2012 at 10:58:26AM +0800, Fengguang Wu wrote: > > Hi Chen, > > > > > But how can bdi related ra_pages reflect different files' readahead > > > window? Maybe these different files are sequential read, random read > > > and so on. > > > > It's simple: sequential reads will get ra_pages readahead size while > > random reads will not get readahead at all. > > > > Talking about the below chunk, it might hurt someone that explicitly > > takes advantage of the behavior, however the ra_pages*2 seems more > > like a hack than general solution to me: if the user will need > > POSIX_FADV_SEQUENTIAL to double the max readahead window size for > > improving IO performance, then why not just increase bdi->ra_pages and > > benefit all reads? One may argue that it offers some differential > > behavior to specific applications, however it may also present as a > > counter-optimization: if the root already tuned bdi->ra_pages to the > > optimal size, the doubled readahead size will only cost more memory > > and perhaps IO latency. > > > > --- a/mm/fadvise.c > > +++ b/mm/fadvise.c > > @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, > > loff_t len, int advice) > > spin_unlock(>f_lock); > > break; > > case POSIX_FADV_SEQUENTIAL: > > - file->f_ra.ra_pages = bdi->ra_pages * 2; > > I think we really have to reset file->f_ra.ra_pages here as it is > not a set-and-forget value. e.g. shrink_readahead_size_eio() can > reduce ra_pages as a result of IO errors. Hence if you have had io > errors, telling the kernel that you are now going to do sequential > IO should reset the readahead to the maximum ra_pages value > supported Good point! but wait this patch removes file->f_ra.ra_pages in all other places too, so there will be no file->f_ra.ra_pages to be reset here... Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]
Hi, KOSAKI On 09/28/2012 06:03 AM, KOSAKI Motohiro wrote: > (9/27/12 2:47 AM), Lai Jiangshan wrote: >> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], >> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes >> node_states[N_NORMAL_MEMORY] becomes stale. > > What's mean 'stale'? I guess > > : Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY] Right. > and > : then it will be invalid if the platform has highmem. Luckily, almost memory > : hotplug aware platform don't have highmem, but are not all. > > right? Some platforms(32 bit) support logic-memory-hotplug. Some platforms have movable memory. They are all considered. > I supporse this patch only meaningful on ARM platform practically. > any platform whic supports memory-hotplug. > > >> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline() >> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] >> are changed while hotpluging. > > >> Also add @status_change_nid_normal to struct memory_notify, thus >> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] >> are changed. > > status_change_nid_normal is very ugly to me. When status_change_nid and > status_change_nid_normal has positive value, they are always the same. > nid and flags value are more natual to me. If we use flags, the semantic of "status_change_nid" is changed and we need to change more current code, and we will add complicated to the memory hotplug callbacks. like this: - node = arg->status_change_nid; + if (arg->status_change_flags & (1UL << N_HIGH_MEMORY)) + node = arg->status_change_nid; + else + node = -1; > > > >> >> Signed-off-by: Lai Jiangshan >> --- >> Documentation/memory-hotplug.txt |5 ++- >> include/linux/memory.h |1 + >> mm/memory_hotplug.c | 94 >> +++-- >> 3 files changed, 83 insertions(+), 17 deletions(-) >> >> diff --git a/Documentation/memory-hotplug.txt >> b/Documentation/memory-hotplug.txt >> index 6d0c251..6e6cbc7 100644 >> --- a/Documentation/memory-hotplug.txt >> +++ b/Documentation/memory-hotplug.txt >> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct >> memory_notify. >> struct memory_notify { >> unsigned long start_pfn; >> unsigned long nr_pages; >> + int status_change_nid_normal; >> int status_change_nid; >> } >> >> start_pfn is start_pfn of online/offline memory. >> nr_pages is # of pages of online/offline memory. >> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask >> +is (will be) set/clear, if this is -1, then nodemask status is not changed. >> status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be) >> set/clear. It means a new(memoryless) node gets new memory by online and a >> node loses all memory. If this is -1, then nodemask status is not changed. >> -If status_changed_nid >= 0, callback should create/discard structures for >> the >> +If status_changed_nid* >= 0, callback should create/discard structures for >> the >> node if necessary. >> >> -- >> diff --git a/include/linux/memory.h b/include/linux/memory.h >> index ff9a9f8..a09216d 100644 >> --- a/include/linux/memory.h >> +++ b/include/linux/memory.h >> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn); >> struct memory_notify { >> unsigned long start_pfn; >> unsigned long nr_pages; >> +int status_change_nid_normal; >> int status_change_nid; >> }; >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 6a5b90d..b62d429b 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, >> unsigned long nr_pages, >> return 0; >> } >> >> +static void check_nodemasks_changes_online(unsigned long nr_pages, >> +struct zone *zone, struct memory_notify *arg) >> +{ >> +int nid = zone_to_nid(zone); >> +enum zone_type zone_last = ZONE_NORMAL; >> + >> +if (N_HIGH_MEMORY == N_NORMAL_MEMORY) >> +zone_last = ZONE_MOVABLE; > > This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem. If we don't have HIGHMEM, any node of N_NORMAL_MEMORY has 0...ZONE_MOVABLE if we have HIGHMEM, any node of N_NORMAL_MEMORY has 0...ZONE_NORMAL > > >> + >> +if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY)) >> +arg->status_change_nid_normal = nid; >> +else >> +arg->status_change_nid_normal = -1; > > Wrong. The onlined node may only have high mem zone. IOW, think fake numa > case etc. "zone_idx(zone) <= zone_last" checks this case. the result is "else" branch. > > >> + >> +if (!node_state(nid, N_HIGH_MEMORY)) >> +arg->status_change_nid = nid; >> +
Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support
Hi Jiri, On Sat, 20 Oct 2012 16:33:08 +0200, Jiri Olsa wrote: > hi, > adding support to read sample values through the PERF_SAMPLE_READ > sample type. It's now possible to specify 'S' modifier for an event > and get its sample value by PERF_SAMPLE_READ. I have a question. What's an actual impact of specifying 'S' modifiere to a non-group event or even only a (non-leader) member of a group? For instance, 'cycles:S' or '{branches,branch-misses:S}'. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kdump with signed images
On Thu, 2012-10-25 at 14:55 -0400, Vivek Goyal wrote: > On Thu, Oct 25, 2012 at 02:40:21PM -0400, Mimi Zohar wrote: > > On Thu, 2012-10-25 at 10:10 -0400, Vivek Goyal wrote: > > > On Thu, Oct 25, 2012 at 02:10:01AM -0400, Mimi Zohar wrote: > > > > > > [..] > > > > IMA-appraisal verifies the integrity of file data, while EVM verifies > > > > the integrity of the file metadata, such as LSM and IMA-appraisal > > > > labels. Both 'security.ima' and 'security.evm' can contain digital > > > > signatures. > > > > > > But the private key for creating these digital signature needs to be > > > on the target system? > > > > > > Thanks > > > Vivek > > > > Absolutely not. The public key needs to be added to the _ima or _evm > > keyrings. Roberto Sassu modified dracut and later made equivalent > > changes to systemd. Both have been upstreamed. > > Putting public key in _ima or _evm keyring is not the problem. This is > just the verification part. > > > Dmitry has a package > > that labels the filesystem called ima-evm-utils, which supports hash > > (IMA), hmac(EVM) and digital signatures(both). > > > > We're hoping that distro's would label all immutable files, not only elf > > executables, with digital signatures and mutable files with a hash. > > So this labeling (digital signing) can happen at build time? There is nothing inherently preventing it from happening at build time. Elana Reshetova gave a talk at LSS 2012 on modifying RPM http://lwn.net/Articles/518265/. > I suspect you need labeling to happen at system install time? If yes, > installer does not have the private key to sign anything. The installed system needs to be labeled, but how that occurs is dependent on your environment (eg. flash, rpm based install). Neither of these mechanisms would require the build private key. On a running system, the package installer, after verifying the package integrity, would install each file with the associated 'security.ima' extended attribute. The 'security.evm' digital signature would be installed with an HMAC, calculated using a system unique key. > IOW, if distro sign a file, they will most likely put signatures in > ELF header (something along the lines of signing PE/COFF binaries). Rusty was definitely against putting the signature in the ELF header for kernel modules. Why would this be any different? > But > I think you need digital signatures to be put in security.ima which are > stored in xattrs and xattrs are not generated till you put file in > question on target file system. > > Thanks > Vivek The 'security.ima' digital signature would be created as part of the build process and stored as an extended attribute with the file, like other metadata. On install, the file, extended attributes and other metadata would be copied to the target file system. Mimi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull more NFS related bugfixes
Hi Linus, This pull fixes a fairly urgent issue with the NFSv2/v3 statd code that is causing Oopses, as well as some long standing races with the SUNRPC tcp code. The following changes since commit 0e9e3e306c7e472bdcffa34c4c4584301eda03b3: Merge tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen (2012-10-24 05:17:27 +0300) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-3 for you to fetch changes up to e498daa81295d02f7359af313c2b7f87e1062207: LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero (2012-10-24 10:46:22 -0400) NFS bugfixes for Linux 3.7 - Fix the NFSv2/v3 kernel statd protocol, which broke due to net namespace related changes. - Fix a number of races in the SUNRPC TCP disconnect/reconnect code. Trond Myklebust (6): SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..." SUNRPC: Prevent races in xs_abort_connection() SUNRPC: Get rid of the xs_error_report socket callback LOCKD: fix races in nsm_client_get LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero fs/lockd/mon.c| 57 +-- net/sunrpc/xprtsock.c | 41 +--- 2 files changed, 42 insertions(+), 56 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [PATCH v6 0/3] Add modules to support realtek PCIE card reader
于 2012年10月26日 02:50, Greg KH 写道: > On Sat, Oct 20, 2012 at 06:46:15AM +0300, Dan Carpenter wrote: >> On Sat, Oct 06, 2012 at 03:23:56PM +0800, wwang wrote: >>> We are still maintaining the SCSI driver for Realtek card reader, >>> and will release the latest source code in the Github in the future. >>> But maybe we won't push it to the staging tree any more. >> Maybe we should just remove the staging code if it won't be fixed. >> That's sort of the point of staging. > I agree. wwang, want me to delet the staging driver now? I don't want > "dead" code in the tree, especially as you don't want to maintain it > anymore. > > thanks, > > greg k-h Hi Greg: OK. You can delete it now, please. And I will push my new driver to MFD and MMC subsystem. Best Regards, wwang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 5/5] zynq: move static peripheral mappings
On Thu, Oct 25, 2012 at 06:41:08PM -0400, Nick Bowler wrote: > On 2012-10-25 16:29 -0500, Josh Cartwright wrote: > > On Thu, Oct 25, 2012 at 04:17:01PM -0400, Nick Bowler wrote: > > > Did you test this on any real hardware? I can't get the ZC702 to work > > > with the UART mapped at this address (this ends up being mapped at > > > 0xFEFFF000), although I can't for the life of me figure out why the > > > virtual address even matters. Note that for the ZC702, the physical > > > address of the "main" UART is 0xE0001000. Good news is you're not crazy; I was able to duplicate the problem here. > If I were to guess, I would guess that, except for when it "Works", > the really really early printk stuff isn't actually hitting the uart > at all. The "Fails" case would then be due to the stray writes > crashing the board, and the "Truncated" case due to the stray writes > being (ostensibly) benign. If I'm not mistaken, this hypothesis is predicated on the early bootup code establishing a (linear?) mapping for addresses > VMALLOC_START; before the mdesc->map_io() is even handled. That seems odd to me. > But I really have no way right now to test this hypothesis, since I > can't print anything in the failing case. Not sure if I'll be able to get anything meaningful out of it yet (I've not historically had good luck with Xilinx's debugging tools), but I did finally get a JTAG debugger hooked up to the zc702. I'll see if I can get any useful information tomorrow. Thanks, Josh pgpYXByIc0y2V.pgp Description: PGP signature
[git pull] drm radeon fixes.
Hi Linus, Just radeon fixes in this one, some new PCI IDs, ATPX regression fix, async VM regression fixes some module options fixes. Dave. The following changes since commit b8e902f24fdd16c4373ddc37a4e150c4afe9c6db: drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs() (2012-10-23 10:15:21 +1000) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes Alex Deucher (6): drm/radeon: add some new SI PCI ids drm/radeon: fix sparse warning drm/radeon: give each backlight a unique id drm/radeon: add error output if VM CS fails on cayman drm/radeon: fix ATPX function documentation drm/radeon: fix ATPX regression in acpi rework Christian König (9): drm/radeon: fix PFP sync in vm_flush drm/radeon: fix cayman_vm_set_page v2 drm/radeon: fix si_set_page v2 drm/radeon: remove set_page check from VM code drm/radeon: fix header size estimation in VM code drm/radeon: fix and simplify pot argument checks v3 drm/radeon: use vzalloc for gart pages drm/radeon: move size limits to gem_object_create. drm/radeon: move the retry to gem_object_create Dave Airlie (1): Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes drivers/gpu/drm/radeon/atombios_encoders.c |5 ++- drivers/gpu/drm/radeon/evergreen_cs.c |1 + drivers/gpu/drm/radeon/ni.c | 45 ++--- drivers/gpu/drm/radeon/nid.h|1 + drivers/gpu/drm/radeon/radeon_atpx_handler.c|6 +- drivers/gpu/drm/radeon/radeon_device.c | 60 +-- drivers/gpu/drm/radeon/radeon_gart.c| 22 - drivers/gpu/drm/radeon/radeon_gem.c | 18 ++- drivers/gpu/drm/radeon/radeon_legacy_encoders.c |5 ++- drivers/gpu/drm/radeon/radeon_object.c | 19 --- drivers/gpu/drm/radeon/si.c | 47 +++--- include/drm/drm_pciids.h|3 + 12 files changed, 122 insertions(+), 110 deletions(-)
Re: [RFC] Support volatile range for anon vma
Hi Christoph, On Thu, Oct 25, 2012 at 03:19:27PM +, Christoph Lameter wrote: > On Thu, 25 Oct 2012, Minchan Kim wrote: > > > #endif > > + /* > > +* True if page in this vma is reclaimed. > > What does that mean? All pages in the vma have been cleared out? It means at least, more than one is reclaimed. Comment should have been cleared. > > > + TTU_IGNORE_VOLATILE = (1 << 11),/* ignore volatile */ > > }; > > #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK) > > > > int try_to_unmap(struct page *, enum ttu_flags flags); > > int try_to_unmap_one(struct page *, struct vm_area_struct *, > > - unsigned long address, enum ttu_flags flags); > > + unsigned long address, enum ttu_flags flags, > > + bool *is_volatile); > > You already pass a vma pointer in. Why do you need to pass a > volatile flag in? Looks like unecessary churn. You mean we can use vma->purged instead of is_volatile passing? The is_volatile is just checking for that all of vmas share the page are volatile ones. Then, vma->purged is just checking for that the page is zapped in the vma. If one of vma share the page isn't volatile, we can't zap. BTW, Christoph, what do you think about the goal of the patch which changes munmap(2) to madvise(2) when user calls free(3) in user allocator like glibc? I guess it would improve system performance very well. But as I wrote down in description, downside of the patch is that we have to age anon lru although we don't have swap. But gain via the patch is bigger than loss via aging of anon lru when memory pressure happens. I don't see other downside other than it. What do you think about it? (I didn't implement anon lru aging in case of no-swap but it's trivial once we decide) Thanks for the review, Christoph > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 13/31] USB: option: blacklist net interface on ZTE devices
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Bjørn Mork commit 1452df6f1b7e396d89c2a1fdbdc0e0e839f97671 upstream. Based on information from the ZTE Windows drivers. Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 74 ++-- 1 file changed, 52 insertions(+), 22 deletions(-) --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -503,11 +503,19 @@ static const struct option_blacklist_inf .reserved = BIT(5), }; +static const struct option_blacklist_info net_intf6_blacklist = { + .reserved = BIT(6), +}; + static const struct option_blacklist_info zte_mf626_blacklist = { .sendsetup = BIT(0) | BIT(1), .reserved = BIT(4), }; +static const struct option_blacklist_info zte_1255_blacklist = { + .reserved = BIT(3) | BIT(4), +}; + static const struct usb_device_id option_ids[] = { { USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_COLT) }, { USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_RICOLA) }, @@ -853,13 +861,19 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0113, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf5_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0117, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0118, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0121, 0xff, 0xff, 0xff) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0118, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf5_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0121, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf5_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0122, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0123, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0124, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0125, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0126, 0xff, 0xff, 0xff) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0123, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0124, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf5_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0125, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf6_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0126, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf5_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0128, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0142, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0143, 0xff, 0xff, 0xff) }, @@ -872,7 +886,8 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0156, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0157, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf5_blacklist }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0158, 0xff, 0xff, 0xff) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0158, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf3_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0159, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0161, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0162, 0xff, 0xff, 0xff) }, @@ -880,9 +895,12 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf4_blacklist }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff) }, - { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf4_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff,
[ 14/31] USB: option: add more ZTE devices
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Bjørn Mork commit 4b35f1c52943851b310afb09047bfe991ac8f5ae upstream. Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 18 ++ 1 file changed, 18 insertions(+) --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -895,12 +895,22 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0191, 0xff, 0xff, 0xff), /* ZTE EuFi890 */ + .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0199, 0xff, 0xff, 0xff), /* ZTE MF820S */ + .driver_info = (kernel_ulong_t)_intf1_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0257, 0xff, 0xff, 0xff), /* ZTE MF821 */ + .driver_info = (kernel_ulong_t)_intf3_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0326, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf4_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf4_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf4_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf4_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1021, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf2_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1058, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1059, 0xff, 0xff, 0xff) }, @@ -1078,8 +1088,16 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1298, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1299, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1300, 0xff, 0xff, 0xff) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1401, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf2_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1402, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_intf2_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1424, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf2_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1425, 0xff, 0xff, 0xff), + .driver_info = (kernel_ulong_t)_intf2_blacklist }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1426, 0xff, 0xff, 0xff), /* ZTE MF91 */ + .driver_info = (kernel_ulong_t)_intf2_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2002, 0xff, 0xff, 0xff), .driver_info = (kernel_ulong_t)_k3765_z_blacklist }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2003, 0xff, 0xff, 0xff) }, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 15/31] cgroup: notify_on_release may not be triggered in some cases
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Daisuke Nishimura commit 1f5320d5972aa50d3e8d2b227b636b370e608359 upstream. notify_on_release must be triggered when the last process in a cgroup is move to another. But if the first(and only) process in a cgroup is moved to another, notify_on_release is not triggered. # mkdir /cgroup/cpu/SRC # mkdir /cgroup/cpu/DST # # echo 1 >/cgroup/cpu/SRC/notify_on_release # echo 1 >/cgroup/cpu/DST/notify_on_release # # sleep 300 & [1] 8629 # # echo 8629 >/cgroup/cpu/SRC/tasks # echo 8629 >/cgroup/cpu/DST/tasks -> notify_on_release for /SRC must be triggered at this point, but it isn't. This is because put_css_set() is called before setting CGRP_RELEASABLE in cgroup_task_migrate(), and is a regression introduce by the commit:74a1166d(cgroups: make procs file writable), which was merged into v3.0. Acked-by: Li Zefan Cc: Ben Blum Signed-off-by: Daisuke Nishimura Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- kernel/cgroup.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1800,9 +1800,8 @@ static int cgroup_task_migrate(struct cg * trading it for newcg is protected by cgroup_mutex, we're safe to drop * it here; it will be freed under RCU. */ - put_css_set(oldcg); - set_bit(CGRP_RELEASABLE, >flags); + put_css_set(oldcg); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 02/31] NLM: nlm_lookup_file() may return NLMv4-specific error codes
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Trond Myklebust commit cd0b16c1c3cda12dbed1f8de8f1a9b0591990724 upstream. If the filehandle is stale, or open access is denied for some reason, nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh or nlm4_failed. These get passed right through nlm_lookup_file(), and so when nlmsvc_retrieve_args() calls the latter, it needs to filter the result through the cast_status() machinery. Failure to do so, will trigger the BUG_ON() in encode_nlm_stat... Signed-off-by: Trond Myklebust Reported-by: Larry McVoy Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/lockd/clntxdr.c |2 +- fs/lockd/svcproc.c |3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) --- a/fs/lockd/clntxdr.c +++ b/fs/lockd/clntxdr.c @@ -223,7 +223,7 @@ static void encode_nlm_stat(struct xdr_s { __be32 *p; - BUG_ON(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD); + WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD); p = xdr_reserve_space(xdr, 4); *p = stat; } --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -67,7 +67,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rq /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - if ((error = nlm_lookup_file(rqstp, , >fh)) != 0) + error = cast_status(nlm_lookup_file(rqstp, , >fh)); + if (error != 0) goto no_locks; *filp = file; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 17/31] media: au0828: fix case where STREAMOFF being called on stopped stream causes BUG()
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Devin Heitmueller commit a595c1ce4c9d572cf53513570b9f1a263d7867f2 upstream. We weren't checking whether the resource was in use before calling res_free(), so applications which called STREAMOFF on a v4l2 device that wasn't already streaming would cause a BUG() to be hit (MythTV). Reported-by: Larry Finger Reported-by: Jay Harbeston Signed-off-by: Devin Heitmueller Signed-off-by: Mauro Carvalho Chehab --- drivers/media/video/au0828/au0828-video.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) --- a/drivers/media/video/au0828/au0828-video.c +++ b/drivers/media/video/au0828/au0828-video.c @@ -1697,14 +1697,18 @@ static int vidioc_streamoff(struct file (AUVI_INPUT(i).audio_setup)(dev, 0); } - videobuf_streamoff(>vb_vidq); - res_free(fh, AU0828_RESOURCE_VIDEO); + if (res_check(fh, AU0828_RESOURCE_VIDEO)) { + videobuf_streamoff(>vb_vidq); + res_free(fh, AU0828_RESOURCE_VIDEO); + } } else if (fh->type == V4L2_BUF_TYPE_VBI_CAPTURE) { dev->vbi_timeout_running = 0; del_timer_sync(>vbi_timeout); - videobuf_streamoff(>vb_vbiq); - res_free(fh, AU0828_RESOURCE_VBI); + if (res_check(fh, AU0828_RESOURCE_VBI)) { + videobuf_streamoff(>vb_vbiq); + res_free(fh, AU0828_RESOURCE_VBI); + } } return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 18/31] drm/i915: apply timing generator bug workaround on CPT and PPT
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Jesse Barnes commit 3bcf603f6d5d18bd9d076dc280de71f48add4101 upstream. On CougarPoint and PantherPoint PCH chips, the timing generator may fail to start after DP training completes. This is due to a bug in the FDI autotraining detect logic (which will stall the timing generator and re-enable it once training completes), so disable it to avoid silent DP mode setting failures. Signed-off-by: Jesse Barnes Signed-off-by: Keith Packard Signed-off-by: Timo Aaltonen --- drivers/gpu/drm/i915/i915_reg.h |5 + drivers/gpu/drm/i915/intel_display.c |4 2 files changed, 9 insertions(+) --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3113,6 +3113,11 @@ #define TRANS_6BPC (2<<5) #define TRANS_12BPC(3<<5) +#define _TRANSA_CHICKEN20xf0064 +#define _TRANSB_CHICKEN20xf1064 +#define TRANS_CHICKEN2(pipe) _PIPE(pipe, _TRANSA_CHICKEN2, _TRANSB_CHICKEN2) +#define TRANS_AUTOTRAIN_GEN_STALL_DIS(1<<31) + #define SOUTH_CHICKEN2 0xc2004 #define DPLS_EDP_PPS_FIX_DIS (1<<0) --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -7584,6 +7584,7 @@ static void ibx_init_clock_gating(struct static void cpt_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; + int pipe; /* * On Ibex Peak and Cougar Point, we need to disable clock @@ -7593,6 +7594,9 @@ static void cpt_init_clock_gating(struct I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE); I915_WRITE(SOUTH_CHICKEN2, I915_READ(SOUTH_CHICKEN2) | DPLS_EDP_PPS_FIX_DIS); + /* Without this, mode sets may fail silently on FDI */ + for_each_pipe(pipe) + I915_WRITE(TRANS_CHICKEN2(pipe), TRANS_AUTOTRAIN_GEN_STALL_DIS); } static void ironlake_teardown_rc6(struct drm_device *dev) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 05/31] Revert: lockd: use rpc clients cl_nodename for id encoding
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Greg Kroah-Hartman This reverts 12d63702c53bc2230dfc997e91ca891f39cb6446 which was commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Greg Kroah-Hartman Cc: Stanislav Kinsbursky Cc: Trond Myklebust --- fs/lockd/mon.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -40,7 +40,6 @@ struct nsm_args { u32 proc; char*mon_name; - char*nodename; }; struct nsm_res { @@ -94,7 +93,6 @@ static int nsm_mon_unmon(struct nsm_hand .vers = 3, .proc = NLMPROC_NSM_NOTIFY, .mon_name = nsm->sm_mon_name, - .nodename = utsname()->nodename, }; struct rpc_message msg = { .rpc_argp = , @@ -431,7 +429,7 @@ static void encode_my_id(struct xdr_stre { __be32 *p; - encode_nsm_string(xdr, argp->nodename); + encode_nsm_string(xdr, utsname()->nodename); p = xdr_reserve_space(xdr, 4 + 4 + 4); *p++ = cpu_to_be32(argp->prog); *p++ = cpu_to_be32(argp->vers); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 16/31] amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Andrew Morton commit 168bfeef7bba3f9784f7540b053e4ac72b769ce9 upstream. If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel=135102834131236=2 It will be rewritten more robustly soonish. Reported-by: Denis Kirjanov Cc: Doug Thompson Signed-off-by: Andrew Morton Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman --- drivers/edac/amd64_edac.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -161,8 +161,11 @@ static int __amd64_set_scrub_rate(struct * memory controller and apply to register. Search for the first * bandwidth entry that is greater or equal than the setting requested * and program that. If at last entry, turn off DRAM scrubbing. +* +* If no suitable bandwidth is found, turn off DRAM scrubbing entirely +* by falling back to the last element in scrubrates[]. */ - for (i = 0; i < ARRAY_SIZE(scrubrates); i++) { + for (i = 0; i < ARRAY_SIZE(scrubrates) - 1; i++) { /* * skip scrub rates which aren't recommended * (see F10 BKDG, F3x58) @@ -172,12 +175,6 @@ static int __amd64_set_scrub_rate(struct if (scrubrates[i].bandwidth <= new_bw) break; - - /* -* if no suitable bandwidth found, turn off DRAM scrubbing -* entirely by falling back to the last element in the -* scrubrates array. -*/ } scrubval = scrubrates[i].scrubval; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 07/31] kernel/sys.c: fix stack memory content leak via UNAME26
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 2702b1526c7278c4d65d78de209a465d4de2885e upstream. Calling uname() with the UNAME26 personality set allows a leak of kernel stack contents. This fixes it by defensively calculating the length of copy_to_user() call, making the len argument unsigned, and initializing the stack buffer to zero (now technically unneeded, but hey, overkill). CVE-2012-0957 Reported-by: PaX Team Signed-off-by: Kees Cook Cc: Andi Kleen Cc: PaX Team Cc: Brad Spengler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/sys.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1133,15 +1133,16 @@ DECLARE_RWSEM(uts_sem); * Work around broken programs that cannot handle "Linux 3.0". * Instead we map 3.x to 2.6.40+x, so e.g. 3.0 would be 2.6.40 */ -static int override_release(char __user *release, int len) +static int override_release(char __user *release, size_t len) { int ret = 0; - char buf[65]; if (current->personality & UNAME26) { - char *rest = UTS_RELEASE; + const char *rest = UTS_RELEASE; + char buf[65] = { 0 }; int ndots = 0; unsigned v; + size_t copy; while (*rest) { if (*rest == '.' && ++ndots >= 3) @@ -1151,8 +1152,9 @@ static int override_release(char __user rest++; } v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40; - snprintf(buf, len, "2.6.%u%s", v, rest); - ret = copy_to_user(release, buf, len); + copy = min(sizeof(buf), max_t(size_t, 1, len)); + copy = scnprintf(buf, copy, "2.6.%u%s", v, rest); + ret = copy_to_user(release, buf, copy + 1); } return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[for-next PATCH V2] PM / devfreq: Add sysfs node to expose available frequencies
devfreq governors such as ondemand are controlled by a min and max frequency, while governors like userspace governor allow us to set a specific frequency. However, for the same specific device, depending on the SoC, the available frequencies can vary. So expose the available frequencies as a snapshot over sysfs to allow informed decisions. This was inspired by cpufreq framework's equivalent for similar usage sysfs node: scaling_available_frequencies. Cc: Rajagopal Venkat Cc: MyungJoo Ham Cc: Kyungmin Park Cc: "Rafael J. Wysocki" Cc: Kevin Hilman Cc: linux...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Nishanth Menon --- Applies on top of Rafael's linux-next branch: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-nextb960e9a Merge branch 'pm-sleep-next' into linux-next Example output from Beagleboard XM (3730) using a dummy test driver http://pastebin.pandaboard.org/index.php/view/85100576 : /sys/devices/platform/iva.0/devfreq/iva.0 # cat available_frequencies 26000 52000 66000 /sys/devices/platform/iva.0/devfreq/iva.0 # cat available_frequencies|tr ' ' '-' 26000-52000-66000 V2 : - review comment update from v1 - protected the sysfs from buffer overflow - just in case.. V1: https://patchwork.kernel.org/patch/1648001/ Documentation/ABI/testing/sysfs-class-devfreq |9 +++ drivers/devfreq/devfreq.c | 32 + 2 files changed, 41 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-class-devfreq b/Documentation/ABI/testing/sysfs-class-devfreq index e6cf08e..e672ccb 100644 --- a/Documentation/ABI/testing/sysfs-class-devfreq +++ b/Documentation/ABI/testing/sysfs-class-devfreq @@ -51,3 +51,12 @@ Description: The /sys/class/devfreq/.../userspace/set_freq shows and sets the requested frequency for the devfreq object if userspace governor is in effect. + +What: /sys/class/devfreq/.../available_frequencies +Date: October 2012 +Contact: Nishanth Menon +Description: + The /sys/class/devfreq/.../available_frequencies shows + the available frequencies of the corresponding devfreq object. + This is a snapshot of available frequencies and not limited + by the min/max frequency restrictions. diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index d02ee7e..104018e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -571,9 +571,41 @@ static ssize_t show_max_freq(struct device *dev, struct device_attribute *attr, return sprintf(buf, "%lu\n", to_devfreq(dev)->max_freq); } +static ssize_t show_available_freqs(struct device *d, + struct device_attribute *attr, + char *buf) +{ + struct devfreq *df = to_devfreq(d); + struct device *dev = df->dev.parent; + struct opp *opp; + ssize_t count = 0; + unsigned long freq = 0; + + rcu_read_lock(); + do { + opp = opp_find_freq_ceil(dev, ); + if (IS_ERR(opp)) + break; + + count += scnprintf([count], (PAGE_SIZE - count - 2), + "%lu ", freq); + freq++; + } while (1); + rcu_read_unlock(); + + /* Truncate the trailing space */ + if (count) + count--; + + count += sprintf([count], "\n"); + + return count; +} + static struct device_attribute devfreq_attrs[] = { __ATTR(governor, S_IRUGO, show_governor, NULL), __ATTR(cur_freq, S_IRUGO, show_freq, NULL), + __ATTR(available_frequencies, S_IRUGO, show_available_freqs, NULL), __ATTR(target_freq, S_IRUGO, show_target_freq, NULL), __ATTR(polling_interval, S_IRUGO | S_IWUSR, show_polling_interval, store_polling_interval), -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 09/31] x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Jacob Shin commit 1e779aabe1f0768c2bf8f8c0a5583679b54a upstream. On systems with very large memory (1 TB in our case), BIOS may report a reserved region or a hole in the E820 map, even above the 4 GB range. Exclude these from the direct mapping. [ hpa: this should be done not just for > 4 GB but for everything above the legacy region (1 MB), at the very least. That, however, turns out to require significant restructuring. That work is well underway, but is not suitable for rc/stable. ] Signed-off-by: Jacob Shin Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.s...@amd.com Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/setup.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -937,8 +937,21 @@ void __init setup_arch(char **cmdline_p) #ifdef CONFIG_X86_64 if (max_pfn > max_low_pfn) { - max_pfn_mapped = init_memory_mapping(1UL<<32, -max_pfnsize <= 1UL << 32) + continue; + + if (ei->type == E820_RESERVED) + continue; + + max_pfn_mapped = init_memory_mapping( + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr, + ei->addr + ei->size); + } + /* can we preseve max_low_pfn ?*/ max_low_pfn = max_pfn; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 11/31] USB: cdc-acm: fix pipe type of write endpoint
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Ming Lei commit c5211187f7ff8e8dbff4ebf7c011ac4c0ffe319c upstream. If the write endpoint is interrupt type, usb_sndintpipe() should be passed to usb_fill_int_urb() instead of usb_sndbulkpipe(). Signed-off-by: Ming Lei Cc: Oliver Neukum Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1172,7 +1172,7 @@ made_compressed_probe: if (usb_endpoint_xfer_int(epwrite)) usb_fill_int_urb(snd->urb, usb_dev, - usb_sndbulkpipe(usb_dev, epwrite->bEndpointAddress), + usb_sndintpipe(usb_dev, epwrite->bEndpointAddress), NULL, acm->writesize, acm_write_bulk, snd, epwrite->bInterval); else usb_fill_bulk_urb(snd->urb, usb_dev, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 12/31] usb: acm: fix the computation of the number of data bits
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Nicolas Boullis commit 301a29da6e891e7eb95c843af0ecdbe86d01f723 upstream. The current code assumes that CSIZE is 060, which appears to be wrong on some arches (such as powerpc). Signed-off-by: Nicolas Boullis Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -760,10 +760,6 @@ static const __u32 acm_tty_speed[] = { 250, 300, 350, 400 }; -static const __u8 acm_tty_size[] = { - 5, 6, 7, 8 -}; - static void acm_tty_set_termios(struct tty_struct *tty, struct ktermios *termios_old) { @@ -780,7 +776,21 @@ static void acm_tty_set_termios(struct t newline.bParityType = termios->c_cflag & PARENB ? (termios->c_cflag & PARODD ? 1 : 2) + (termios->c_cflag & CMSPAR ? 2 : 0) : 0; - newline.bDataBits = acm_tty_size[(termios->c_cflag & CSIZE) >> 4]; + switch (termios->c_cflag & CSIZE) { + case CS5: + newline.bDataBits = 5; + break; + case CS6: + newline.bDataBits = 6; + break; + case CS7: + newline.bDataBits = 7; + break; + case CS8: + default: + newline.bDataBits = 8; + break; + } /* FIXME: Needs to clear unsupported bits in the termios */ acm->clocal = ((termios->c_cflag & CLOCAL) != 0); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 08/31] use clamp_t in UNAME26 fix
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream. The min/max call needed to have explicit types on some architectures (e.g. mn10300). Use clamp_t instead to avoid the warning: kernel/sys.c: In function 'override_release': kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default] Reported-by: Fengguang Wu Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/sys.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1152,7 +1152,7 @@ static int override_release(char __user rest++; } v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 40; - copy = min(sizeof(buf), max_t(size_t, 1, len)); + copy = clamp_t(size_t, len, 1, sizeof(buf)); copy = scnprintf(buf, copy, "2.6.%u%s", v, rest); ret = copy_to_user(release, buf, copy + 1); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 06/31] pcmcia: sharpsl: dont discard sharpsl_pcmcia_ops
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Arnd Bergmann commit fdc858a466b738d35d3492bc7cf77b1dac98bf7c upstream. The sharpsl_pcmcia_ops structure gets passed into sa11xx_drv_pcmcia_probe, where it gets accessed at run-time, unlike all other pcmcia drivers that pass their structures into platform_device_add_data, which makes a copy. This means the gcc warning is valid and the structure must not be marked as __initdata. Without this patch, building collie_defconfig results in: drivers/pcmcia/pxa2xx_sharpsl.c:22:31: fatal error: mach-pxa/hardware.h: No such file or directory compilation terminated. make[3]: *** [drivers/pcmcia/pxa2xx_sharpsl.o] Error 1 make[2]: *** [drivers/pcmcia] Error 2 make[1]: *** [drivers] Error 2 make: *** [sub-make] Error 2 Signed-off-by: Arnd Bergmann Cc: Dominik Brodowski Cc: Russell King Cc: Pavel Machek Cc: linux-pcm...@lists.infradead.org Cc: Jochen Friedrich Signed-off-by: Greg Kroah-Hartman --- drivers/pcmcia/pxa2xx_sharpsl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/pcmcia/pxa2xx_sharpsl.c +++ b/drivers/pcmcia/pxa2xx_sharpsl.c @@ -222,7 +222,7 @@ static void sharpsl_pcmcia_socket_suspen sharpsl_pcmcia_init_reset(skt); } -static struct pcmcia_low_level sharpsl_pcmcia_ops __initdata = { +static struct pcmcia_low_level sharpsl_pcmcia_ops = { .owner = THIS_MODULE, .hw_init= sharpsl_pcmcia_hw_init, .hw_shutdown= sharpsl_pcmcia_hw_shutdown, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 29/31] xHCI: add aborting command ring function
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Elric Fu commit b92cc66c047ff7cf587b318fe377061a353c120f upstream. Software have to abort command ring and cancel command when a command is failed or hang. Otherwise, the command ring will hang up and can't handle the others. An example of a command that may hang is the Address Device Command, because waiting for a SET_ADDRESS request to be acknowledged by a USB device is outside of the xHC's ability to control. To cancel a command, software will initialize a command descriptor for the cancel command, and add it into a cancel_cmd_list of xhci. Sarah: Fixed missing newline on "Have the command ring been stopped?" debugging statement. This patch should be backported to kernels as old as 3.0, that contain the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an assertion to check for virt_dev=0 bug." That commit papers over a NULL pointer dereference, and this patch fixes the underlying issue that caused the NULL pointer dereference. Signed-off-by: Elric Fu Signed-off-by: Sarah Sharp Tested-by: Miroslav Sabljic Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mem.c |7 ++ drivers/usb/host/xhci-ring.c | 108 +++ drivers/usb/host/xhci.c |2 drivers/usb/host/xhci.h | 12 4 files changed, 128 insertions(+), 1 deletion(-) --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1505,6 +1505,7 @@ void xhci_free_command(struct xhci_hcd * void xhci_mem_cleanup(struct xhci_hcd *xhci) { struct pci_dev *pdev = to_pci_dev(xhci_to_hcd(xhci)->self.controller); + struct xhci_cd *cur_cd, *next_cd; int size; int i; @@ -1525,6 +1526,11 @@ void xhci_mem_cleanup(struct xhci_hcd *x xhci_ring_free(xhci, xhci->cmd_ring); xhci->cmd_ring = NULL; xhci_dbg(xhci, "Freed command ring\n"); + list_for_each_entry_safe(cur_cd, next_cd, + >cancel_cmd_list, cancel_cmd_list) { + list_del(_cd->cancel_cmd_list); + kfree(cur_cd); + } for (i = 1; i < MAX_HC_SLOTS; ++i) xhci_free_virt_device(xhci, i); @@ -2014,6 +2020,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, xhci->cmd_ring = xhci_ring_alloc(xhci, 1, true, false, flags); if (!xhci->cmd_ring) goto fail; + INIT_LIST_HEAD(>cancel_cmd_list); xhci_dbg(xhci, "Allocated command ring at %p\n", xhci->cmd_ring); xhci_dbg(xhci, "First segment DMA is 0x%llx\n", (unsigned long long)xhci->cmd_ring->first_seg->dma); --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -320,6 +320,114 @@ void xhci_ring_cmd_db(struct xhci_hcd *x xhci_readl(xhci, >dba->doorbell[0]); } +static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) +{ + u64 temp_64; + int ret; + + xhci_dbg(xhci, "Abort command ring\n"); + + if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) { + xhci_dbg(xhci, "The command ring isn't running, " + "Have the command ring been stopped?\n"); + return 0; + } + + temp_64 = xhci_read_64(xhci, >op_regs->cmd_ring); + if (!(temp_64 & CMD_RING_RUNNING)) { + xhci_dbg(xhci, "Command ring had been stopped\n"); + return 0; + } + xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; + xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, + >op_regs->cmd_ring); + + /* Section 4.6.1.2 of xHCI 1.0 spec says software should +* time the completion od all xHCI commands, including +* the Command Abort operation. If software doesn't see +* CRR negated in a timely manner (e.g. longer than 5 +* seconds), then it should assume that the there are +* larger problems with the xHC and assert HCRST. +*/ + ret = handshake(xhci, >op_regs->cmd_ring, + CMD_RING_RUNNING, 0, 5 * 1000 * 1000); + if (ret < 0) { + xhci_err(xhci, "Stopped the command ring failed, " + "maybe the host is dead\n"); + xhci->xhc_state |= XHCI_STATE_DYING; + xhci_quiesce(xhci); + xhci_halt(xhci); + return -ESHUTDOWN; + } + + return 0; +} + +static int xhci_queue_cd(struct xhci_hcd *xhci, + struct xhci_command *command, + union xhci_trb *cmd_trb) +{ + struct xhci_cd *cd; + cd = kzalloc(sizeof(struct xhci_cd), GFP_ATOMIC); + if (!cd) + return -ENOMEM; + INIT_LIST_HEAD(>cancel_cmd_list); + + cd->command = command; + cd->cmd_trb = cmd_trb; + list_add_tail(>cancel_cmd_list, >cancel_cmd_list); + + return 0; +} + +/* + * Cancel the command which has issue. + * + *
[ 30/31] xHCI: cancel command after command timeout
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Elric Fu commit 6e4468b9a0793dfb53eb80d9fe52c739b13b27fd upstream. The patch is used to cancel command when the command isn't acknowledged and a timeout occurs. This patch should be backported to kernels as old as 3.0, that contain the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an assertion to check for virt_dev=0 bug." That commit papers over a NULL pointer dereference, and this patch fixes the underlying issue that caused the NULL pointer dereference. Signed-off-by: Elric Fu Signed-off-by: Sarah Sharp Tested-by: Miroslav Sabljic Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci.c | 26 +++--- drivers/usb/host/xhci.h |3 +++ 2 files changed, 22 insertions(+), 7 deletions(-) --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1778,6 +1778,7 @@ static int xhci_configure_endpoint(struc struct completion *cmd_completion; u32 *cmd_status; struct xhci_virt_device *virt_dev; + union xhci_trb *cmd_trb; spin_lock_irqsave(>lock, flags); virt_dev = xhci->devs[udev->slot_id]; @@ -1820,6 +1821,7 @@ static int xhci_configure_endpoint(struc } init_completion(cmd_completion); + cmd_trb = xhci->cmd_ring->dequeue; if (!ctx_change) ret = xhci_queue_configure_endpoint(xhci, in_ctx->dma, udev->slot_id, must_succeed); @@ -1841,14 +1843,17 @@ static int xhci_configure_endpoint(struc /* Wait for the configure endpoint command to complete */ timeleft = wait_for_completion_interruptible_timeout( cmd_completion, - USB_CTRL_SET_TIMEOUT); + XHCI_CMD_DEFAULT_TIMEOUT); if (timeleft <= 0) { xhci_warn(xhci, "%s while waiting for %s command\n", timeleft == 0 ? "Timeout" : "Signal", ctx_change == 0 ? "configure endpoint" : "evaluate context"); - /* FIXME cancel the configure endpoint command */ + /* cancel the configure endpoint command */ + ret = xhci_cancel_cmd(xhci, command, cmd_trb); + if (ret < 0) + return ret; return -ETIME; } @@ -2781,8 +2786,10 @@ int xhci_alloc_dev(struct usb_hcd *hcd, unsigned long flags; int timeleft; int ret; + union xhci_trb *cmd_trb; spin_lock_irqsave(>lock, flags); + cmd_trb = xhci->cmd_ring->dequeue; ret = xhci_queue_slot_control(xhci, TRB_ENABLE_SLOT, 0); if (ret) { spin_unlock_irqrestore(>lock, flags); @@ -2794,12 +2801,12 @@ int xhci_alloc_dev(struct usb_hcd *hcd, /* XXX: how much time for xHC slot assignment? */ timeleft = wait_for_completion_interruptible_timeout(>addr_dev, - USB_CTRL_SET_TIMEOUT); + XHCI_CMD_DEFAULT_TIMEOUT); if (timeleft <= 0) { xhci_warn(xhci, "%s while waiting for a slot\n", timeleft == 0 ? "Timeout" : "Signal"); - /* FIXME cancel the enable slot request */ - return 0; + /* cancel the enable slot request */ + return xhci_cancel_cmd(xhci, NULL, cmd_trb); } if (!xhci->slot_id) { @@ -2860,6 +2867,7 @@ int xhci_address_device(struct usb_hcd * struct xhci_slot_ctx *slot_ctx; struct xhci_input_control_ctx *ctrl_ctx; u64 temp_64; + union xhci_trb *cmd_trb; if (!udev->slot_id) { xhci_dbg(xhci, "Bad Slot ID %d\n", udev->slot_id); @@ -2898,6 +2906,7 @@ int xhci_address_device(struct usb_hcd * xhci_dbg_ctx(xhci, virt_dev->in_ctx, 2); spin_lock_irqsave(>lock, flags); + cmd_trb = xhci->cmd_ring->dequeue; ret = xhci_queue_address_device(xhci, virt_dev->in_ctx->dma, udev->slot_id); if (ret) { @@ -2910,7 +2919,7 @@ int xhci_address_device(struct usb_hcd * /* ctrl tx can take up to 5 sec; XXX: need more time for xHC? */ timeleft = wait_for_completion_interruptible_timeout(>addr_dev, - USB_CTRL_SET_TIMEOUT); + XHCI_CMD_DEFAULT_TIMEOUT); /* FIXME: From section 4.3.4: "Software shall be responsible for timing * the SetAddress() "recovery interval" required by USB and aborting the * command on a timeout. @@ -2918,7 +2927,10 @@ int xhci_address_device(struct usb_hcd * if (timeleft <= 0) { xhci_warn(xhci, "%s while waiting for a slot\n", timeleft == 0 ? "Timeout" : "Signal"); - /* FIXME cancel the
[ 21/31] RDS: fix rds-ping spinlock recursion
3.0-stable review patch. If anyone has any objections, please let me know. -- From: "jeff.liu" [ Upstream commit 5175a5e76bbdf20a614fb47ce7a38f0f39e70226 ] This is the revised patch for fixing rds-ping spinlock recursion according to Venkat's suggestions. RDS ping/pong over TCP feature has been broken for years(2.6.39 to 3.6.0) since we have to set TCP cork and call kernel_sendmsg() between ping/pong which both need to lock "struct sock *sk". However, this lock has already been hold before rds_tcp_data_ready() callback is triggerred. As a result, we always facing spinlock resursion which would resulting in system panic. Given that RDS ping is only used to test the connectivity and not for serious performance measurements, we can queue the pong transmit to rds_wq as a delayed response. Reported-by: Dan Carpenter CC: Venkat Venkatsubra CC: David S. Miller CC: James Morris Signed-off-by: Jie Liu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/send.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/net/rds/send.c +++ b/net/rds/send.c @@ -1121,7 +1121,7 @@ rds_send_pong(struct rds_connection *con rds_stats_inc(s_send_pong); if (!test_bit(RDS_LL_SEND_FULL, >c_flags)) - rds_send_xmit(conn); + queue_delayed_work(rds_wq, >c_send_w, 0); rds_message_put(rm); return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 31/31] xHCI: handle command after aborting the command ring
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Elric Fu commit b63f4053cc8aa22a98e3f9a97845afe6c15d0a0d upstream. According to xHCI spec section 4.6.1.1 and section 4.6.1.2, after aborting a command on the command ring, xHC will generate a command completion event with its completion code set to Command Ring Stopped at least. If a command is currently executing at the time of aborting a command, xHC also generate a command completion event with its completion code set to Command Abort. When the command ring is stopped, software may remove, add, or rearrage Command Descriptors. To cancel a command, software will initialize a command descriptor for the cancel command, and add it into a cancel_cmd_list of xhci. When the command ring is stopped, software will find the command trbs described by command descriptors in cancel_cmd_list and modify it to No Op command. If software can't find the matched trbs, we can think it had been finished. This patch should be backported to kernels as old as 3.0, that contain the commit 7ed603ecf8b68ab81f4c83097d3063d43ec73bb8 "xhci: Add an assertion to check for virt_dev=0 bug." That commit papers over a NULL pointer dereference, and this patch fixes the underlying issue that caused the NULL pointer dereference. Note from Sarah: The TRB_TYPE_LINK_LE32 macro is not in the 3.0 stable kernel, so I added it to this patch. Signed-off-by: Elric Fu Signed-off-by: Sarah Sharp Tested-by: Miroslav Sabljic Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 171 +-- drivers/usb/host/xhci.h |3 2 files changed, 168 insertions(+), 6 deletions(-) --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1157,6 +1157,20 @@ static void handle_reset_ep_completion(s } } +/* Complete the command and detele it from the devcie's command queue. + */ +static void xhci_complete_cmd_in_cmd_wait_list(struct xhci_hcd *xhci, + struct xhci_command *command, u32 status) +{ + command->status = status; + list_del(>cmd_list); + if (command->completion) + complete(command->completion); + else + xhci_free_command(xhci, command); +} + + /* Check to see if a command in the device's command queue matches this one. * Signal the completion or free the command, and return 1. Return 0 if the * completed command isn't at the head of the command list. @@ -1175,15 +1189,144 @@ static int handle_cmd_in_cmd_wait_list(s if (xhci->cmd_ring->dequeue != command->command_trb) return 0; - command->status = GET_COMP_CODE(le32_to_cpu(event->status)); - list_del(>cmd_list); - if (command->completion) - complete(command->completion); - else - xhci_free_command(xhci, command); + xhci_complete_cmd_in_cmd_wait_list(xhci, command, + GET_COMP_CODE(le32_to_cpu(event->status))); return 1; } +/* + * Finding the command trb need to be cancelled and modifying it to + * NO OP command. And if the command is in device's command wait + * list, finishing and freeing it. + * + * If we can't find the command trb, we think it had already been + * executed. + */ +static void xhci_cmd_to_noop(struct xhci_hcd *xhci, struct xhci_cd *cur_cd) +{ + struct xhci_segment *cur_seg; + union xhci_trb *cmd_trb; + u32 cycle_state; + + if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue) + return; + + /* find the current segment of command ring */ + cur_seg = find_trb_seg(xhci->cmd_ring->first_seg, + xhci->cmd_ring->dequeue, _state); + + /* find the command trb matched by cd from command ring */ + for (cmd_trb = xhci->cmd_ring->dequeue; + cmd_trb != xhci->cmd_ring->enqueue; + next_trb(xhci, xhci->cmd_ring, _seg, _trb)) { + /* If the trb is link trb, continue */ + if (TRB_TYPE_LINK_LE32(cmd_trb->generic.field[3])) + continue; + + if (cur_cd->cmd_trb == cmd_trb) { + + /* If the command in device's command list, we should +* finish it and free the command structure. +*/ + if (cur_cd->command) + xhci_complete_cmd_in_cmd_wait_list(xhci, + cur_cd->command, COMP_CMD_STOP); + + /* get cycle state from the origin command trb */ + cycle_state = le32_to_cpu(cmd_trb->generic.field[3]) + & TRB_CYCLE; + + /* modify the command trb to NO OP command */ + cmd_trb->generic.field[0] = 0; + cmd_trb->generic.field[1] = 0; +
[ 03/31] oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Dan Carpenter commit 44009105081b51417f311f4c3be0061870b6b8ed upstream. The "event" variable is a u16 so the shift will always wrap to zero making the line a no-op. Signed-off-by: Dan Carpenter Signed-off-by: Robert Richter Signed-off-by: Greg Kroah-Hartman --- arch/x86/oprofile/nmi_int.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/oprofile/nmi_int.c +++ b/arch/x86/oprofile/nmi_int.c @@ -55,7 +55,7 @@ u64 op_x86_get_ctrl(struct op_x86_model_ val |= counter_config->extra; event &= model->event_mask ? model->event_mask : 0xFF; val |= event & 0xFF; - val |= (event & 0x0F00) << 24; + val |= (u64)(event & 0x0F00) << 24; return val; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 19/31] net: Fix skb_under_panic oops in neigh_resolve_output
3.0-stable review patch. If anyone has any objections, please let me know. -- From: "ramesh.naga...@gmail.com" [ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ] The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: Ramesh Nagappa Reviewed-by: Shawn Lu Reviewed-by: Robert Coulson Reviewed-by: Billie Alsup Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/neighbour.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1313,8 +1313,6 @@ int neigh_resolve_output(struct sk_buff if (!dst) goto discard; - __skb_pull(skb, skb_network_offset(skb)); - if (!neigh_event_send(neigh, skb)) { int err; struct net_device *dev = neigh->dev; @@ -1326,6 +1324,7 @@ int neigh_resolve_output(struct sk_buff neigh_hh_init(neigh, dst, dst->ops->protocol); do { + __skb_pull(skb, skb_network_offset(skb)); seq = read_seqbegin(>ha_lock); err = dev_hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); @@ -1358,9 +1357,8 @@ int neigh_connected_output(struct sk_buf struct net_device *dev = neigh->dev; unsigned int seq; - __skb_pull(skb, skb_network_offset(skb)); - do { + __skb_pull(skb, skb_network_offset(skb)); seq = read_seqbegin(>ha_lock); err = dev_hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] firmware loader: introduce module parameter to customize fw search path
This patch introduces one module parameter of 'path' in firmware_class to support customizing firmware image search path, so that people can use its own firmware path if the default built-in paths can't meet their demand[1], and the typical usage is passing the below from kernel command parameter when 'firmware_class' is built in kernel: firmware_class.path=$CUSTOMIZED_PATH [1], https://lkml.org/lkml/2012/10/11/337 Cc: Linus Torvalds Signed-off-by: Ming Lei --- V1: - remove kernel boot parameter and only support the feature by module parameter as suggested by Greg Documentation/firmware_class/README |5 + drivers/base/firmware_class.c | 23 +-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index 815b711..ce02744 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -22,12 +22,17 @@ - calls request_firmware(_entry, $FIRMWARE, device) - kernel searchs the fimware image with name $FIRMWARE directly in the below search path of root filesystem: + User customized search path by module parameter 'path'[1] "/lib/firmware/updates/" UTS_RELEASE, "/lib/firmware/updates", "/lib/firmware/" UTS_RELEASE, "/lib/firmware" - If found, goto 7), else goto 2) + [1], the 'path' is a string parameter which length should be less + than 256, user should pass 'firmware.path=$CUSTOMIZED_PATH' if + firmware_class is built in kernel(the general situation) + 2), userspace: - /sys/class/firmware/xxx/{loading,data} appear. - hotplug gets called with a firmware identifier in $FIRMWARE diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 8945f4e..b363103 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -274,6 +274,16 @@ static const char *fw_path[] = { "/lib/firmware" }; +static char fw_path_para[256]; + +/* + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH' + * from kernel command because firmware_class is generally built in + * kernel instead of module. + */ +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644); +MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path"); + /* Don't inline this: 'struct kstat' is biggish */ static noinline long fw_file_size(struct file *file) { @@ -313,9 +323,18 @@ static bool fw_get_filesystem_firmware(struct firmware_buf *buf) bool success = false; char *path = __getname(); - for (i = 0; i < ARRAY_SIZE(fw_path); i++) { + for (i = -1; i < ARRAY_SIZE(fw_path); i++) { struct file *file; - snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id); + + if (i < 0) { + if (!fw_path_para[0]) /* No customized path */ + continue; + snprintf(path, PATH_MAX, "%s/%s", fw_path_para, +buf->fw_id); + } else { + snprintf(path, PATH_MAX, "%s/%s", fw_path[i], +buf->fw_id); + } file = filp_open(path, O_RDONLY, 0); if (IS_ERR(file)) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 22/31] tcp: resets are misrouted
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Alexey Kuznetsov [ Upstream commit 4c67525849e0b7f4bd4fab2487ec9e43ea52ef29 ] After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no sock case").. tcp resets are always lost, when routing is asymmetric. Yes, backing out that patch will result in misrouting of resets for dead connections which used interface binding when were alive, but we actually cannot do anything here. What's died that's died and correct handling normal unbound connections is obviously a priority. Comment to comment: > This has few benefits: > 1. tcp_v6_send_reset already did that. It was done to route resets for IPv6 link local addresses. It was a mistake to do so for global addresses. The patch fixes this as well. Actually, the problem appears to be even more serious than guaranteed loss of resets. As reported by Sergey Soloviev , those misrouted resets create a lot of arp traffic and huge amount of unresolved arp entires putting down to knees NAT firewalls which use asymmetric routing. Signed-off-by: Alexey Kuznetsov Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_ipv4.c |7 --- net/ipv6/tcp_ipv6.c |3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -651,10 +651,11 @@ static void tcp_v4_send_reset(struct soc arg.csumoffset = offsetof(struct tcphdr, check) / 2; arg.flags = (sk && inet_sk(sk)->transparent) ? IP_REPLY_ARG_NOSRCCHECK : 0; /* When socket is gone, all binding information is lost. -* routing might fail in this case. using iif for oif to -* make sure we can deliver it +* routing might fail in this case. No choice here, if we choose to force +* input interface, we will misroute in case of asymmetric route. */ - arg.bound_dev_if = sk ? sk->sk_bound_dev_if : inet_iif(skb); + if (sk) + arg.bound_dev_if = sk->sk_bound_dev_if; net = dev_net(skb_dst(skb)->dev); ip_send_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr, --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1060,7 +1060,8 @@ static void tcp_v6_send_response(struct __tcp_v6_send_check(buff, , ); fl6.flowi6_proto = IPPROTO_TCP; - fl6.flowi6_oif = inet6_iif(skb); + if (ipv6_addr_type() & IPV6_ADDR_LINKLOCAL) + fl6.flowi6_oif = inet6_iif(skb); fl6.fl6_dport = t1->dest; fl6.fl6_sport = t1->source; security_skb_classify_flow(skb, flowi6_to_flowi()); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ 26/31] sparc64: do not clobber personality flags in sys_sparc64_personality()
3.0-stable review patch. If anyone has any objections, please let me know. -- From: Jiri Kosina [ Upstream commit a27032eee8cb6e16516f13c8a9752e9d5d4cc430 ] There are multiple errors in how sys_sparc64_personality() handles personality flags stored in top three bytes. - directly comparing current->personality against PER_LINUX32 doesn't work in cases when any of the personality flags stored in the top three bytes are used. - directly forcefully setting personality to PER_LINUX32 or PER_LINUX discards any flags stored in the top three bytes Fix the first one by properly using personality() macro to compare only PER_MASK bytes. Fix the second one by setting only the bits that should be set, instead of overwriting the whole value. Signed-off-by: Jiri Kosina Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/sys_sparc_64.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/arch/sparc/kernel/sys_sparc_64.c +++ b/arch/sparc/kernel/sys_sparc_64.c @@ -519,12 +519,12 @@ SYSCALL_DEFINE1(sparc64_personality, uns { int ret; - if (current->personality == PER_LINUX32 && - personality == PER_LINUX) - personality = PER_LINUX32; + if (personality(current->personality) == PER_LINUX32 && + personality(personality) == PER_LINUX) + personality |= PER_LINUX32; ret = sys_personality(personality); - if (ret == PER_LINUX32) - ret = PER_LINUX; + if (personality(ret) == PER_LINUX32) + ret &= ~PER_LINUX32; return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/